Data Mesh Radio Patreon – get access to interviews well before they are released
Episode list and links to all available episode transcripts (most interviews from #32 on) here
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Amy’s LinkedIn: https://www.linkedin.com/in/amy-raygada/
In this episode, Scott interviewed Amy Raygada, Senior Director and Analytics Product Manager at Swiss Marketplace Group (SMG).
Some key takeaways/thoughts from Amy’s point of view:
- When signing up to do data mesh, you won’t exactly know what you are signing up for – it will be different than your expectations in a number of ways so be prepared to be flexible. And it’s a marathon, not a sprint. Have patience, you don’t need to get it perfect on day one.
- It’s crucial to have a data product owner, someone who can ask the uncomfortable questions and make sure you are actually treating your data as a product. It’s easy to get lost in focusing on the technical but that’s not what drives the business value.
- Look to start with a reasonable scope use case instead of a massive challenge. Make it so the teams can learn and iterate because you aren’t solving the hardest challenges – yet.
- Consider starting with a single domain and with a key consumer being that same producing domain. As they learn more about owning data and see value from owning their data, they will be better able to serve other teams too.
- Spend the necessary time ideating and brain storming before beginning your journey. You want everyone to be working together so a clear initial vision – at least for first steps – is key. But keep updating your prioritizations as you learn more.
- Data mesh requires major changes for organizations. People aren’t good with very abrupt changes so look for the incremental changes where viable. It’s okay to take a bit longer if you maintain people’s happiness instead of trying to rush to the finish and burn everyone out.
- Domains need help getting to data capable, being able to own their own data. You can’t just shove ownership on them, you have to onboard them, help them understand how to work with data. You will definitely need to “babysit” them as they learn and that’s expected and okay.
- You will probably communicate more than you expect even though everyone tells you to expect this 🙂 pair well and make sure people understand what is to be done and why it’s crucial.
- It might be hard at first but a central data team should be focused on enabling domains to own their data. There might be some desire to do the work for them but then you end up with central data ownership again.
- Data contracts are relatively easy to understand conceptually and give a scope to what data owners must adhere to instead of something like “give me high quality data”. And the data owners can set better boundaries/SLAs or push back on requirements more easily.
- When working with a domain, make sure they understand the reason you are asking them to take on new types of work – what’s the outcome? Show them what happens when they break a data contract so they can understand the impact. And build a good relationship so you can use them as your internal success story too.
- Make schema contract validation – before deploying changes – as simple as possible so software engineers can check if changes will break data contracts. If yes and the change is necessary, then set a versioning strategy for that data product.
- If you set teams up to understand the why of data mesh first, then what changes for them, it leads to a much easier conversation. Talk about what is the current state and why that’s not where you want to be.
- Leverage your first movers to be your advocates. Once you have one success story, pair with them to bring other people on board. The water is fine, jump in!
- If a producing domain – or stakeholders in the domain – is hesitant to engage on something like data mesh, work with them to show initial value. Once they see it has a benefit and isn’t some monumental task – and that you are there to help – they are much more willing to participate.
- Look to cross bridges when you come to them. Obviously plan ahead but don’t get too wrapped up in what might happen.
Amy started by talking about something many other guests have probably felt but few have said: when signing up to do data mesh, you won’t really know for sure what it will be. And that’s okay, your journey will take you places you didn’t expect and have hurdles and obstacles you can’t see or predict. But that’s all okay, you can learn and iterate along the way. Expect the unexpected.
When Amy started interviewing at SMG, the team was not as familiar with data mesh. But for the past six months it’s been a key part of her focus. She paired up with the head of data engineering and worked to brainstorm before moving forward – that pre-work lasted about 2-3 months. There was also a lot of other change happening in the general technology and data landscape at SMG – moving from on-prem to cloud, moving from monolith to microservices, moving off legacy technology, etc. – but that meant they were able to get everyone together for a 2 day workshop and really look at things from a fresh perspective. At the same point, people are not used to abrupt changes and with data mesh, there will be a LOT of changes – look to implement that over time, don’t be in a rush.
SMG decided to start with a single domain – leads – instead of multiple domains. The initial use cases were useful for the leads domain as well, especially by significantly improving data quality. As part of enabling that domain, Amy and team are working closely with them to teach how to handle data, what data ownership actually means.
The leads domain was chosen because they had a significant need for help with data quality and had moved more to cloud and microservices than other domains. It was also a smaller, more manageable problem than say sales, which is in a major transition to Salesforce. They had three non-mesh data products that were impacted from poor leads data quality so there was a lot of downstream issues that they could address, a lot of incremental business value to drive by fixing the data quality issue. The other domains they considered were in big transitions so it would be harder to get where was necessary to drive value in a PoC with a lot more work and risk.
Data contracts have been a key goal and key driver for Amy and team. The goal is to better define what you are trying to do with data, what do you as a data owner need to actually do and deliver and what can the data consumer expect. The driver aspect is that data owners have something more concrete so they are willing because they only have to deliver what they say they will. It gives a limited scope to their data work.
A focus for Amy and team is to be the enablers only, building out the platform and teaching people how to own but not do/own as central data ownership was causing issues in the first place – it’s what data mesh specifically moves us away from. The domain teams will certainly need “babysitting” but that is expected and can mean more information flowing to the platform team to make improvements too. Data ownership isn’t a one or a zero, it’s a process.
Amy believes it’s important to really pair with domains to share the logic with them behind data mesh – why are we doing this change – and not make it feel like you are changing their ways of working instead of adding new, value-add capabilities to the domain. This close relationship has allowed them to do a data contract demo where you can show the domain what happens when they make a change that will violate their contract. That way, they understand what happens to downstream consumers – possibly themselves – when they make a breaking change. And that the platform alerts them to a breaking change too so they have a better chance of preventing issues themselves.
Similar to what Chris Riccomini mentioned in episode 51, Amy and team are implementing automated schema validation checking at the pull request level. This prevents breaking changes from going through with consumers being the first to know about an issue. It also kicks off a conversation about should this change be made and if so, how will they do versioning. And Amy knows this can overwhelm some people but the team she is working with understands the pain so they are eager to prevent that pain. They are also looking at data reviews – similar to architecture reviews – to assess if operational system changes will impact the data. Abhi Sivasailam in episode 9 mentioned they are doing a similar process.
Amy believes – and Scott agrees – patience is crucial. Getting the first domain into a really good spot and then enabling them to share their story and their learnings will be crucial when they try to go to additional domains. Not being in a massive hurry means teams have the time and space to learn how to own data instead of piling a huge additional workload on top of an overburdened team.
Educating the general company about what they are doing with data mesh has gone well. Amy created a Miro board using Barr Moses’ old joke of data mess to data mesh. So they are working to explain the what and the why to everyone involved but in a simplified way. Talk about what changes – the new responsibilities and what those mean and drive. Talk about how you can bring everyone to the table and especially what benefits data mesh has for them. Really focus on the practical of what are they being asked to do and why.
While many stakeholders in their initial domain were anxious to engage at first, now that there is proven value, according to Amy those hesitant stakeholders are much more willing to pair up. They are providing test cases to the data team so they can quickly validate value and iterate together. They are already seeing the benefit of the work with other stakeholders and it’s getting those previously hesitant stakeholders excited to team up. So if you want buy-in, look to provide some value first and then show/prove that value. Yes, easier said than done.
Make sure to team up with any domains that have had success so they can help you sell other domains on working with you. They can help educate and also show that you are actually providing the business value you claim.
It’s super easy to get bogged down by metrics. Push back on big metrics requests – why do you actually need this? As Alla Hale mentioned in episode 122: “What would having this unlock for you?” If it doesn’t unlock value, why do it?
Having prioritization meetings weekly keeps everyone on the same page, heading in the same direction.
It’s easy to get wrapped up in what might happen. Focus more on what’s in front of you and what you are trying to do. Don’t cross bridges before you come to them 🙂 there are countless bridges in data mesh, focus more on the now.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB