#129 Iterating Data Governance for Data Mesh: Lessons Learned from ‘The Data Governance Coach’ – Interview w/ Nicola Askham

Data Mesh Radio Patreon – get access to interviews well before they are released

Episode list and links to all available episode transcripts (most interviews from #32 on) here

Provided as a free resource by DataStax AstraDB; George Trujillo’s contact info: email (george.trujillo@datastax.com) and LinkedIn

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.

Nicola’s website: https://www.nicolaaskham.com/

The Data Governance Podcast: https://www.nicolaaskham.com/podcast

Nicola’s LinkedIn: https://www.linkedin.com/in/nicolaaskham/

In this episode, Scott interviewed Nicola Askham, a data governance consultant known simply as The Data Governance Coach and host of The Data Governance Podcast.

Some key takeaways/thoughts from Nicola’s point of view:

  1. The key point of data governance: ensure the data we use is the right data for the right people to better address business challenges. Everything you do with governance should circle back to that.
  2. In doing data governance right, you need to set yourself up to take in feedback and iterate. You absolutely won’t get everything right upfront. It’s crucial to set expectations that your data governance approach will evolve as you learn more.
  3. If you see data mesh as being about making better data more accessible to your current data consumers, that’s a very big opportunity wasted. Aim to significantly expand your pool of data citizens. Not everyone should be a data scientist but data should play a role in much more people’s jobs.
  4. To get going in data mesh, you need to get your data governance to “good enough” and start moving forward. Think about what you need – is it the very complicated standard to last the next decade or is it about getting people to understand and trust the data they can now access? Probably the second…
  5. To drive buy-in for data governance, you should tailor your message to the audience. It’s very hard to have universal appeal around a specific selling point of data governance but data governance can – and should – drive value for everyone.
  6. Every data governance approach should be tailored to the organization but it should start from a few building blocks: A) policy; B) processes and standards; and C) roles and responsibilities. (More info below)
  7. A centralized data governance team making decisions about what to do with specific data will not scale – they just can’t have the context/knowledge needed. So federated governance has been the sensible approach for a long time, it’s just not necessarily easy to do right. Or at least it’s quite easy to do wrong.
  8. Central governance teams are crucial – they make it easy for federated teams to do what’s necessary to comply with regulations and internal standards but with as little friction as possible. The central governance team should be a value-add, not a gatekeeper.
  9. Make sure teams understand data governance can add significant value to them. Participation is not just some mandate, it has a benefit. Then make sure you are actually providing that value.
  10. Data governance has a bad name because of those few using it to put up obstacles. Data governance needs to be about lowering friction, not creating it. Easier said than done of course.
  11. In data mesh, you will likely need new roles to handle new data governance needs. Where previously, there were some vague ownership requirements, they should be much more explicit in data mesh.
  12. Tying to the point above, you will likely have different sets of requirements under roles across your domains. That’s okay. Look to create a standard model for roles and responsibilities and adjust where it needs to be adjusted. It’s okay to have non-uniform roles but there needs to be a starting point for domains to go from.
  13. When starting out in data mesh, look for a relatively simple first use case but don’t only stick to simple use cases early in your journey. It will make it much harder to tackle the difficult use cases later. You don’t want a mesh that can’t tackle hard but high value use cases!

Nicola started the conversation sharing her thoughts on “normal” data governance. What does it even mean? Is that federated, centralized, decentralized, etc.? What she’s seen is that functional data governance – at an organization of scale – is not centralized in day-to-day decision making. The central team just can’t have the context, the knowledge to make good decisions quickly if at all. But there absolutely needs to be a central team to provide support and knowledge and set federated teams up to succeed. The central team needs to focus on friction-reduction and value-add work. To do that, you need to create standards and processes. But she emphasized keep your frameworks, processes, and standards as simple as possible – no one single, all-encompassing standard please!

As an example of functional governance, Nicola talked about how you can’t just have universal data quality standards. Every use case may require a different combination of data quality – why optimize for completeness if it’s not needed? A central governance team should be focused on helping business stakeholders define aspects of data quality and how to measure it – that way, data consumers can understand the quality of what they consume without learning different standards for each new data source but we aren’t setting data quality requirements that aren’t helpful or useful.

On data mesh specifically, Nicola agrees with most guests: data mesh is very much about the people side. That means the central governance team needs to collaborate with people outside the central team to iterate and improve upon your data governance approach. Feedback leading to improvements is necessary, the governance team can’t issue decrees from on high. A part of data mesh that excites her is trying to solve for the age-old challenges of ensuring the data is the right data and that we get it in front of the right people to answer questions about the business – lowering friction to leveraging our data. What “right” means is always somewhat open to interpretation of course.

Whether you are doing data mesh or not, Nicola believes data governance can’t be about obstacles. That is how data governance got a bad reputation. The phrase should spark joy, not fear or revulsion. Instead, it has to be about making it easy for data consumers to find the right data and then being able to find the right people and documentation to help them understand that data. Governance is about providing low-friction ways to provide access and drive understanding of your data and how to properly use it.

One thing Nicola has learned working on a data mesh implementation is that while in data mesh, there are a few new responsibilities that are called out explicitly, that might fall under different roles in different domains. Some responsibilities may fall under a data owner in one domain and under a data steward or mesh data product owner in another. The differing role types she recommends are data owner, mesh data product owner, and data steward. Find a standard setup for roles and responsibilities and then let the domains move responsibilities around as needed – don’t make the domains come up with everything from scratch but don’t hold on to your standard setup closely either. Everything in data mesh is about iteration and evolution!

Scott asked what actually is “good enough” data governance to get moving in a data mesh implementation. Nicola pointed out that no matter what, you won’t get your data governance perfect when starting. Especially with something as immature as data mesh is right now. So have clear indications but nothing set in stone. And use her building blocks framework below. Also, think about what capabilities are needed early to drive value: is that some complicated interoperability standards or some data quality definitions/measurement to enable people to understand and trust the data? Probably data quality definitions.

According to Nicola, every data governance approach should be tailored to the organization but it should start from a few building blocks: A) policy – as it mandates who will be required to do what and why? Domains just don’t do data governance out of the goodness of their hearts; B) processes and standards – lay out what you are trying to achieve and why and then give people an easy way to achieve that. That drives consistency and reduces friction, a win-win; and C) roles and responsibilities – it’s very crucial to assign ownership and layout who exactly owns what; we’ve all been to meetings with no clear next steps and they are almost always a waste of time. Who owns driving things forward? Be clear about it.

Some additional data mesh governance advice Nicola gave: 1) Look for a relatively simple first use case. What has a high chance of success where you can also get some momentum and learnings? 2) Don’t only look for the simple use cases early in your journey. That can lead to not being able to actually face the hard parts when they come. And with data, of course the hard parts will come. 3) Communicate early and often that you want to collaborate with people and that things will change. Solicit feedback and make constituents part of shaping your governance. 4) Make it clear the central team is there to help and not control – help around compliance, help with reducing friction, etc.

A general sentiment that has worked well for Nicola in the past is telling people outside the governance team: if you don’t get more value from data governance than you put into, we’ll change our data governance frameworks. The governance team may feel the pressure but if you aren’t adding value with your governance, outside of regulatory compliance, why are you doing it? Teams will want to participate if you give them a reason to so find the value-add reasons to.

Scott asked about a particularly difficult question in data mesh: who owns downstream data products – data products combined from data of upstream data products? For Nicola, if the data isn’t transformed, the ownership of the data, even in those downstream data products, should still be whoever owns the originating data product. Essentially, the ownership still flows if the data isn’t transformed. But that can cause issues as well if the upstream mesh data product owner doesn’t have direct control of how data is used downstream. High communication, trust, and good lineage are necessary.

According to Nicola, some people are looking at data mesh as only making data more accessible and usable to existing data consumers. But that’s a big missed opportunity. Data mesh can make data more accessible to more employees, driving better decisions. We need data literacy to get to that target outcome but implementing data mesh will lead to a lot of wasted potential if we don’t expand the data consumer pool.

In wrapping up, Nicola talked about how you can actually drive buy-in for data governance. While trying to sell everyone on upping your data governance game with the same message is not likely to succeed, data governance really does have value for all participants. If it doesn’t, as Nicola noted earlier, you need to change your governance approach. So drive that home; tailor the message and speak to – after listening about – their pain points and how you can help them address the pain.

Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, and/or nevesf

Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB

Leave a Reply

Your email address will not be published.