Data Mesh Radio Patreon – get access to interviews well before they are released
Episode list and links to all available episode transcripts (most interviews from #32 on) here
In this episode, Scott interviewed Martina Ivaničová, Data Intelligence Engineering Manager at the travel services company Kiwi.com.
Some key takeaways/thoughts from Martina’s point of view:
- The most important – and possibly one of the most difficult – aspect of a data mesh implementation is “triggering organizational change”.
- Driving buy-in for something like data mesh is obviously not easy. As you are getting started, look to leverage 1:1 conversations to really share what you are trying to do and why and how this can impact them and the organization. These 1:1 conversations are crucial to developing early momentum.
- On driving buy-in for data mesh, really think about how to limit incremental cognitive load as much as possible on developers/software engineers. If you can keep cognitive load low, you are much more likely to succeed – succeed in driving buy-in and succeed in delivering value.
- When sharing internally about data mesh, it’s important to focus on what it means to the other person. Using “data mesh” as a phrase can lead to a lot of confusion for people not on the data team. Make it clear what you are trying to accomplish – the what, the why, and the how. Using data-as-a-product as the leading concept resonated and worked well.
- Kiwi.com started driving buy-in by working with the engineering upper management, then found a few valuable and achievable first use cases to move forward. And they have kept cognitive low on the engineering teams while they learn how to deliver data as a product.
- If possible, the easiest way to drive buy-in is by finding a use case that is beneficial to the producing domain. If not, then look to spend the 1:1 time to really share why this matters.
- Kiwi.com is getting software engineers in domains to commit to simply sharing their data, not even really structuring into data products. So the software engineers in most cases are really only focused on maintaining high-quality data sharing mechanisms – read: pipelines. That is a relatively low initial cognitive load/low workload ask.
- Analytics engineers are creating the data products from the sourced data to satisfy consumer needs. Martina and team want to move to software engineers handling more of data product creation/management over time but it’s a process. They plan for analytics engineers to upskill the software engineers by pairing with them closely.
- It might initially be more important to find a way to evaluate and iterate on what data is shared and how than getting to the most complex or valuable data product. You want to build the muscle around sharing data first before trying to go too big too soon.
- It’s important to know what you are trying to prove out in your initial data mesh related deployment. It’s okay to prove out you can produce data products before proving out you can build out the full mesh.
- A key success metric for a data mesh journey could be how many direct conversations and then actions come from data producers and consumers speaking without data engineering involvement. At Kiwi.com, these conversations are still usually driven by analytics engineers but that might change in the future.
- Data governance centralization didn’t happen overnight. When you look to decentralize and federate your governance, you should look to be patient instead of trying for an overnight revolution.
Martina started by discussing how historical – legacy might be too harsh – data approaches like the enterprise data warehouse haven’t kept up with the mass proliferation of data sources. When we were taking data from the monolith or monoliths, it was far easier to think about what data you might have and try to arrange it into something consumable. But now, with data coming from so many microservices and from external vendors and partners, it just isn’t possible to use the same historical approach – too many things are changing. The centralized data team trying to own hundreds of pipelines flowing into one central lake or warehouse that they also own – it just wasn’t scaling. So when the Kiwi.com data team ran across data mesh, it was very exciting – it was a way for the people with the business context to conceivably own and manage sharing their data in a reliable way.
The historical general approach to data governance – one centralized team trying to make context-dependent decisions for all the domains – just never made sense to Martina. They just could never know the context well enough to make good choices, especially good choices in a timely manner. She noted that if you are moving from that approach, centralization didn’t happen in a day, it evolved. Your move to decentralization should also evolve – think thin slicing and decentralizing more and more rather than pushing all ownership to all domains at once.
Martina then talked about driving buy-in, a topic Scott circled back on frequently throughout the conversation. She noted – as many have, notably Khanh Chau in his episode – how hard driving buy-in can be when people haven’t felt the specific pain you are speaking to. So she and her team worked to really have deep conversations with the software engineers about how important treating and sharing data as a product can be and how the data team will work to maintain low cognitive load on the software engineers.
So, how did they start driving buy-in? First, Martina and team worked with engineering upper management to make sure that as they moved forward with domain teams, they would have support. Then, they focused on finding good first use cases. What could be a use case that would drive significant value if they got it right where they could also limit incremental cognitive load on the software engineers? And what had a high likelihood of success to start to build out proof points and momentum?
Martina mentioned how truly crucial the low initial cognitive load aspect was to driving their data mesh journey forward. The central data team wanted to spend at most 2-3 days with software engineers to teach them how to share data. Is that going to be them creating actual data products on their own? Quite frankly, no. It was about teaching them how to share data and probably more importantly in the long-run, how to think about sharing data – that data product thinking. Then, analytics engineers structure the data shared into actual data products. This setup means it is easy to evaluate and iterate along the way.
Scott asked what were they trying to prove out initially – a dataset had value, they could build data products, or that they could build a data mesh? For Martina and team, it was more about building out a reliable way to share data, so their proof of value was focused on proving they could build data products. One really crucial aspect they wanted to test was could they bring the data producers and consumers together with a good outcome without the data engineers – so the producers, the analytics engineers, and the end consumers working together. And the answer is yes, they are seeing great results there! The direct relationships between data producers and consumers is spurring the data producers to rethink how they share and what they share – and very importantly what more data could they share.
Martina, like many other guests, brought up the general industry need for redefining data contracts – they just don’t do a ton of things we need. We don’t have a good way to detect semantic drift or often even to prevent changes before they break something. It’s even difficult with lots of existing tooling for data producers to see who is consuming their data – and almost non-existent on data producers knowing specifically how the consumers are using their data. And so many more issues that should be wrapped into contracts.
In circling back on buy-in, Martina talked about how in a brownfield deployment, there are puts and takes. Negatives include dealing with the issues of existing tech debt, it’s difficult to get prioritization, etc. But a positive is that you already have an existing backlog of requests where you can find some interesting use cases to try out for your data mesh proof of value/concept.
As part of driving buy-in for the proof of value/concept, Martina and team had to do a lot of 1:1 conversations. It can be frustrating to have to do so but these conversations are crucial to building initial momentum. Martina had some issues when she tried to explain they were doing data mesh to non data people internally; it’s so easy to get confused by data mesh, especially for those same people. So she created a 1-pager focused on the data-as-a-product concept to help people understand what they were trying to accomplish. Focus on informing people the what, the why, and the how. Data mesh is more of an implementation detail to them. This is what Scott keeps referring to as “unicorn farts” – in every bit of internal documentation for consumption outside the data team, copy+find+replace “data mesh” with “unicorn farts”. Because then you will delete every mention of data mesh – and unicorn farts – so you can focus on what actually matters to the other party.
Martina shared about the current role structure of their data mesh journey: data engineers focusing on the data platform, analytics engineers building data models on top of source aligned data to create consumer aligned data products, software engineers focusing on sharing source-aligned data, and data consumers producing aggregated data models across different data products. It is difficult to say they are building full source-aligned data products as of yet as they train their software engineers to really work with data and use data-as-a-product thinking. Remember, they are training them on sharing data for 2-3 days total – you can’t bring someone that far along in learning how to handle data in half a week! Their goal is to embed the analytics engineers further into the domains to really upskill the software engineers more around data but it’s early days.
It might feel a bit obvious but it’s good to say out loud, per Martina: triggering organizational change is the most important part to getting your data mesh journey moving. It will be difficult to get moving but trying to build out your platform early or trying to get teams to create data products without the organizational support is very likely to fail. You need to get that organizational change going.
A few other interesting points to highlight:
A key initial success criteria was seeing software engineers start to consider what additional data they could share that could be useful and how they could share it reliably.
Buy-in for building a data product is obviously easiest when the domain will be the consumer but when that isn’t the case, the 1:1 conversations and having upper engineering management buy-in made it possible to get domains to do the work.
Figuring out who owns data brought – or bought – in from the outside can be extremely difficult. Who wants to own often low quality data from an external source that needs to be cleaned and made into a product?
Kiwi.com leveraged their existing stack for their initial data mesh implementation. There are of course lots of missing capabilities but they can still deliver good incremental value without every piece of the platform in place.
Martina’s LinkedIn: https://www.linkedin.com/in/martina-ivanicova/
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB