Data Mesh Radio Patreon – get access to interviews well before they are released
Episode list and links to all available episode transcripts (most interviews from #32 on) here
Provided as a free resource by DataStax AstraDB
In this episode, Scott interviewed Immanuel Schweizer, the Data Officer for EMD Electronics.
Some interesting thoughts and questions from the conversation:
- Good governance starts at data collection – what are ethical and compliant ways to collect data from the beginning? This points to intentionality around data use stretching into the application – what should you collect that might not be part of the day-to-day application function but that might might lead to generating insights that will be used to generate a better user experience? And what are the ethical concerns?
- Should we initially create data products to serve specific use cases or should we focus on sharing data first and then shaping what people consume most into data products? EMD is approaching data products from a different angle than most, using the second approach.
- When looking at data mesh, should you start with the high data maturity teams or work to pull everyone up to at least a decent baseline maturity level? If you work with the most mature teams, will their challenges really be applicable to the not-so-mature domains? Can you find good reuse patterns to scale your mesh implementation?
- Domain owners are much more willing to share data if they understand use cases for how their data will be used and maintain control to prevent misuse. Reluctance comes from an incomplete picture causing concerns – the more visibility into how data can be and is being used, the more willing domain owners are to share. But understanding your end-to-end data supply chain is tough, especially to start.
- How do you evaluate when to spend the time with a domain to get them data mesh ready? If you need a high value use case to justify spending time with that domain, are you leaving many domains behind? This ties to #2 and #3.
- Set your target picture but be ready to adjust your target picture along the way. The world is ever changing, don’t lock in to an expected target outcome.
- Good data governance is about speeding up 1) access to and 2) usage of data.
- EMD launched a data literacy program where the employees spend the majority of a 10 week timeframe learning about data and how to make use of it. For Immanuel, making things tangible relative to data makes people much less hesitant to explore and use data.
- You should make using data a “part of the job” so it is tracked and part of the review process. Otherwise, you are missing out on a key incentive to leverage data.
- How many people in your organization wish they could be leveraging data more often to make decisions? What’s holding them back? Is it tooling, knowledge, incentivization, access, etc.?
- How can we democratize insights? So much of insight generation is one-off, how do we make that scalable, shareable, and repeatable?
Per Immanuel, EMD’s data mesh journey is not that typical in that they are still getting their arms around centralizing data in a constructive way. It was previously locked away in the domains. So, they are starting their data mesh – or decentralization – journey by centralizing data in a certain sense. Wannes Rosiers mentioned this at DPG Media as well. This enables breaking down silos and starting from common ways of working so there is more cohesiveness around centralized data sharing. There are some concerns from the domains about how do they maintain control to ensure compliant usage.
As the team learn how to put their data onto centralized infrastructure, Immanuel shared that they are simultaneously working on how they will hand more control back to domains. EMD is in the initial stages of their data maturity journey but they are mapping out how they plan to move forward with next steps. They are focused on giving data producers the visibility to how their data is used so the data domain owners can feel comfortable.
And Immanuel gave a good insight to those starting their data mesh journey: understanding the end-to-end data supply chain is really hard when you are just establishing that supply chain. And understanding that supply chain, how data is consumed downstream, is very important to giving the domains the visibility they need to feel comfortable sharing more and more of their data. Some are calling this data on the threshold – data might be data on the outside but isn’t yet.
Immanuel and team are starting their data mesh journey by bringing a significant amount of data into a central data lake and then watching for data consumption patterns emerge. Then, when there is a use case that is worthwhile, that will get promoted to being data product worthy. Then they work backward to find the owners so they can ensure the upstream data production is actually managed as a product. Now, since data challenges aren’t flagged to a centralized team, the data can be fixed where it should be fixed – upstream, at the source systems.
Based on that approach of getting access to data outside of data products, Immanuel mentioned how this could bring domains to the table sooner in a data mesh implementation. In his view, it is often quite expensive to get a domain on board and capable of sharing data like a product. So, if you require a specific, data consumer-driven use case or use cases before investing in that domain enough that they can share data, it can mean you only look for very large return use cases. Or that the initial cost of bringing a domain on to sharing on the mesh falls disproportionately on your early data products. In Immanuel’s approach, the initial cost of sharing is much lower, and then use cases emerge to further justify work. So more domains share sooner. But it has the drawback of the domain not sharing their data intentionally upfront so it might not have as much of the domain context.
Immanuel mentioned that you can’t centralize ownership of data quality and access and expect to scale. You really need to figure out how you distribute your data ownership appropriately. And that you can’t rely on a data engineer in every domain, so how do you lower the bar to sharing?
Immanuel’s approach to the big picture is to set a target picture and a north star but adjusting the expected target picture along the way. If you aren’t flexible and aren’t taking in new information and adjusting accordingly, are you really ready for the flexibility required to do data mesh?
So how is EMD approaching their data strategy? Per Immanuel, they started with a company-wide data strategy – again, setting that high-level target picture. Then, they started to measure data maturity across each domain. They had to answer should they move ahead with domains that are already high data maturity or try to level up the capabilities of less mature domains so they can participate in a data sharing economy.
They decided to focus on bringing all domains up to a certain maturity level – if you don’t, it can mean issues scaling your approaches, per Immanuel. An example is that if one domain is doing MLOps with 10s of ML models in production and another domain is running everything off Excel and Access – or worse, PowerPoint – their needs may be completely different. The things you learn and ways of working you get from the very data mature team just won’t translate well to immature domains so every domain will need specialized help. That just isn’t scalable.
Immanuel mentioned how data governance can often create mixed feelings. He said that good data governance is like brakes in your car – brakes are there to allow you to go fast safely. Governance is not about overseeing every bit of data usage, it’s about speeding up access and usage of data. Governance should be an enabling factor in a best-case scenario. And that is why federated governance can be so powerful – we give the control to the people most worried about data usage and give them the tooling/knowledge to own most of that governance.
EMD created a general data literacy campaign, making everyone aware of and on the same page about definitions for a lot of core concepts around data, analytics, use cases, etc. – making them all aware of the vocabulary. This was just the first step though.
For Immanuel, he has seen data literacy success come from programs that take people by the hand regarding data and make them use the systems – the workflows and mechanisms – especially around governance. Hence, they created a 10 week program for focusing on using data. He said it is looking successful because it makes things so much more tangible. People can understand the entire flow of data through the system and then they have a better idea of what they could do. It’s only just wrapping up its first cohort of people but the attendees seem very excited about it.
In Immanuel’s view, people are naturally curious around data. The issue with using data has been lack of tools and access to data in domains – sharing and leveraging data wasn’t viewed as part of the typical job in most domains and it wasn’t part of most people’s KPIs or reviews. It is important to give them the right tools and the right incentives so people can and want to explore data and be curious. Pushing reports at people doesn’t engage curiosity.
Who is this new data and analytics world for? For Immanuel, it’s for the people who are already data curious but aren’t leveraging data nearly as much and/or as scalably as they could be – we need to give them the tools to make working with data more scalable and sharable. Then, data becomes a topic that is tangible for everyone in the business – can we start replacing Excel and PowerPoint but still make it simple for people to explore the data?
Per Immanuel, part of moving towards scalable analytics requires us to unlock the one-off insight generation, make it so we can democratize the generated insights. There are too many instances of people generating good and useful insights that are just lost, only seen by that single person. And/or only seen that one time. How do we make it so there is an easy, happy path to sharing insights that are long-lived?
When asked about the initial ROI on data mesh and the big data literacy campaign to-date, Immanuel pointed to the main value thus far has been giving people the capability and encouragement to explore data. Exploration has allowed people to understand the organization and they’ve found the most reused data – and then focused their data work efforts on the places people are using the most. That prevents time spent on data products that aren’t valued by consumers.
In wrapping up, a question Immanuel thinks it’s important to ask: how do you define a happy place for your org and for each domain? No journey to a happy place will look alike. And no happy place will look alike either. What does a good state along the way look like? We don’t need to be in a rush to get to the finish line.
Immanuel’s LinkedIn: https://www.linkedin.com/in/immanuel-schweizer-17839242/
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB