Sign up for Data Mesh Understanding’s free roundtable and introduction programs here: https://landing.datameshunderstanding.com/
Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Mariana’s LinkedIn: https://www.linkedin.com/in/mariana-hebborn-phd-118035117/
In this episode, Scott interviewed Mariana Hebborn, Lead of Data Governance for the Healthcare Sector at Merck Group Germany (not Merck, the pharmaceutical company).
Some key takeaways/thoughts from Mariana’s point of view:
- It’s crucial to answer why are you doing data governance. Is it for improving data quality? Better data security? Know what you are trying to achieve to best focus your efforts.
- Make it easy for people to understand how and why to share their knowledge with the rest of the organization. The people mindset really is the most important aspect of successful digital and/or data transformation.
- Most everyone knows we need to go to federated data governance but the big question is how. How can we do it safely? How can we evolve? It isn’t a simple switch we can flip.
- To drive buy-in for moving from a centralized data governance approach, we need to show the benefits of federated – when done well – versus a monolithic approach.
- At the end of the day, governance is about conversation and missioning – why should you care about governance? What value will it drive for your organization? Answer those questions first.
- We need to find ways to organize closer to the source to capture far more domain knowledge when sharing data. Centralized teams just can’t understand the context in a large and complex organization.
- ?Controversial?: Most data access should be to packaged insights – the computational result – rather than raw data itself. Most people consuming information want the insights, not the raw data.
- We need to take learnings from operations and microservices so we drive to clear boundaries, clear responsibilities, and easy access to the information people need. That will prevent data silos and keep us agile.
- Who should have access to what data is far less cut-and-dry than we’d like. It is much more about what’s in the data and the specific usage. So domains need to understand some clear rules but make the decisions because they understand the data itself far better.
- “The best data owners and data stewards are found, not made.”
- We need to get to a place mentally where data governance is so ingrained, there’s never a question of if we should be doing it.
- Start with looking to better govern the data that is already generating good value – or is otherwise important – first. Don’t try to govern all data at the start.
- If data consumers aren’t sure if it’s allowed and/or appropriate to use data in certain way, it is on their shoulders to ask. Data owners and stewards can only set rules that go so far. Otherwise, data owners and stewards cannot feel comfortable giving others access to their data.
- Everyone should be able to browse what data is available even if they cannot have access by default. You can find more use cases that data consumers wouldn’t have thought were possible or available.
- Lack of knowledge of the law and regulations does not protect you from the law and regulations. Don’t be naive.
- “Centralize the knowledge within the domain”.
For Mariana, when talking about data governance, the general industry consensus is that we need to get to federated governance, but the big question is how to actually do that. And governance needs and the pace of change are very different depending on the industry. Many industries are already adopting the federation mindset but are still struggling to do data governance well – it’s either centrally managed or it’s a bunch of silos. How can we get past that?
And according to Mariana – and Scott – centralized data governance is a pragmatic approach. Until it isn’t. At scale, centralized data governance is breaking in most – if not all – large organizations. So we need to look for ways to organize closer to the people with the knowledge about the data so they can share the domain-specific context far better than anyone else. But we still want to lean on governance experts to keep domains aligned with the greater organization. Mariana believes we can win over people by clearly comparing federated versus the monolithic approach – for domains and for the greater organization. Show them what data governance means for them, why it matters, and why it benefits them, instead of trying to show them exactly what to do.
We have already figured out how to do cross domain information sharing on the operational plane in Mariana’s view. That prevents the silos but is also not a centralized way of working. So, we need to figure out how to do the same with data for analytics, taking a lot of the same learnings from moving to microservices. We need clear boundaries and accountability – it is needlessly confusing when we don’t know who is responsible for what. In data, we need to focus on getting people access to the information they need to better the organization. And Mariana knows the idea of central access and control feels like a good one – the central team knows governance best right? But, it just doesn’t work well at scale.
According to Mariana, much of doing federated governance well is about changing your mindsets. To get started with federating your governance, you need to find your data owners and data stewards. And per Mariana, “the best data owners and data stewards are found, not made.” There needs to be people on the ground in the domain ready to clean up before federating governance to that domain to tidy things up as best as possible. And we want to move towards more/better governance as well. You can’t go from no governance to governing everything. Start with what matters and what drives value.
In many long-evolving industries or companies, people often ask if we should even do data governance in Mariana’s experience. Really data driven companies don’t ask that – we need to get to a place mentally where data governance is so ingrained, there’s never a question of if we should be doing it. But we also can’t boil the ocean. So start with looking to better govern the data that is already very important first, whether that is because they are generating value, are sensitive, are widely used, etc. And then figure out what you are trying to get from data governance – is it quality, better security around your data, something else? If you don’t know, figure that out first.
Mariana believes crisp policies about compliance are very helpful to lower the amount of effort people need to be in compliance. We can’t have every domain team learn all applicable laws and regulations. Well-crafted policies mean there is less work interpreting what is and what is not allowed. Of course, this is far easier said than done.
At the end of the day, governance is about conversation and missioning according to Mariana – why should you care about governance. It’s very easy to fall into decentralized instead of federated governance. Decentralized is where you end up with data silos. And there are information silos to overcome as well – knowledge is typically trapped in people’s heads instead of disseminated through the organization.
For Mariana, when doing federated governance, it’s best to centralize the knowledge within each domain. And bring the rules to the data. Make it easy for people to understand how and why to share their knowledge with the rest of the organization. The people mindset really is the most important part of a successful digital and/or data transformation.
Quick tidbits:
Most data access should be to packaged insights – the computational result – rather than raw data itself. Most people consuming information want the insights, not the raw data.
We need to identify who are the people who are already working with data, who knows who should and who already has access to the data.
Everyone should be able to browse what data is available even if they cannot have access by default. You can find more use cases that data consumers wouldn’t have thought were possible or available.
Accountability for data owners and stewards only extends so far. They need to own who should have access and what is proper use but if someone misuses data, that is on the person or team misusing the data, not the data owner or steward. Data consumers should ask if they aren’t sure if something is allowable/appropriate.
Lack of knowledge of the law and regulations does not protect you from the law and regulations. Don’t be naive.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf