Use code DATAGOV23 for 35% off ebook copies of Designing Data Governance from the Ground Up here: https://pragprog.com/titles/lmmlops/designing-data-governance-from-the-ground-up/
Please Rate and Review us on your podcast app of choice!
Get involved with Data Mesh Understanding’s free community roundtables and introductions: https://landing.datameshunderstanding.com/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Lauren’s LinkedIn: https://www.linkedin.com/in/laurenmaffeo/
Designing Data Governance from the Ground Up (Lauren’s book): https://pragprog.com/titles/lmmlops/designing-data-governance-from-the-ground-up/
In this episode, Scott interviewed Lauren Maffeo, author of the book Designing Data Governance from the Ground Up and adjunct Lecturer at George Washington University. To be clear, she was only representing her own views on the episode.
Some key takeaways/thoughts from Lauren’s point of view:
- In governance, a very easy way to go down a bad path is to not automate your standards. Making governance an easy aspect of data work will go a long, long way.
- ?Controversial?: As an industry in general, data governance maturity is still at the infancy phase. And the pace of maturation is far lower than other aspects of software like security.
- The majority of organizations are not mature enough with data governance to get a lot of value from things like ML or NLP.
- Data governance best practices are hard to come by. There isn’t really even a large community specific to data governance for people to easily exchange ideas.
- If the number one cause of cybersecurity breaches is employees, is the number one cause of bad decisions made on data because of A) bad data or B) employees not being data fluent enough to make the right decisions? Does the second contribute to the first?
- “You really have to embed data literacy into very strategic ways of communicating with the organization and educating them that way. Without that approach, I think very little progress can actually be made.”
- ?Controversial?: It’s not that hard to teach people to be relatively data fluent. Not to a data engineer level but investing a relatively small amount in your people can get them to a level they can at least typically spot low quality data.
- ?Controversial?: You can’t just “do data governance later” and have a good outcome. Scott note: this absolutely shouldn’t be controversial but still seems to be a common pattern.
- If we actually want to treat data as a product, that includes practices around data too. A product manager couldn’t decide to put out their first roadmap three months after the product is shipped.
- It’s hard to measure the ROI on data governance work but it’s pretty clear that it’s necessary when your data work is not driving results.
- Despite governance being crucial, you can still roll governance out in a measured manner. You can create ways for people to get used to the idea before rolling it out and it isn’t as if day 1 ‘everything changes!’, it can be a gradual process that adds value, not (many) complications.
- Embracing a fail fast culture in data will probably be hard but it’s absolutely something everyone should do. Make the ‘failures’ small and contained – limit the blast radius – but still failing fast “the essence of innovation in tech.”
- Look to leverage sandbox environments to try out new approaches to governance and get data stewards used to anything new you will be asking of them. A new policy or tool doesn’t have to be a sudden light switch flip.
- “We also can’t afford for leaders of any department to not know what quality data looks like for their teams, because their success, the success of their teams depends on having quality data that their customers trust.”
- ?Controversial?: Data mesh is a risky endeavor, do it in a small-scale way and keep it very cost contained as you test if it will work in your organization.
- For driving data literacy/fluency, gamification really can work. Look to make learning about data a fun and gamified experience.
Lauren started with a somewhat common refrain for this podcast: the pace of data practice maturation – around governance and data practices as a whole – is just not keeping pace with innovation in other aspects of software. Even the pace of conversation is not maturing as fast. Cybersecurity is maturing very quickly for example but we’re just not seeing that in data. So companies are just not ready to really derive a lot of value from things like machine learning (ML) or natural language processing (NLP).
One of the big issues around industry maturity and data governance for Lauren is that there isn’t even a large community around the topic. So there isn’t a larger cohesive conversation around data governance best practices. Scott note: it’s really hard to have a broader conversation too because approaches and practices vary widely and data governance has about 15 varied subtopics that each deserve their own focus rather than being lumped under a huge umbrella of ‘other’ that governance has become.
Most cybersecurity breaches, Lauren pointed out, are caused by internal employees making a mistake. So how do we think about that relative to data? Is that people creating low quality data and/or is it people not being data fluent enough to actually make good decisions based on data? In cybersecurity, there is a big emphasis on training people to see what an attack looks like – should we take the same approach in data regarding bad quality data? “You really have to embed data literacy into very strategic ways of communicating with the organization and educating them that way. Without that approach, I think very little progress can actually be made.”
Lauren talked about how few companies are really going broad with their data literacy programs, training up a large number of their employees. There is a lot of talk about that as part of data governance programs but few are walking the walk. And she believes it’s not that hard to get people to a relatively data fluent level – understanding SQL, being able to more easily spot low quality data, etc.
“We’ll do the data governance later,” is something Lauren has seen and heard in conversations. Governance is seen as something that can be layered on like a coat of paint at the end of a car being manufactured. But because good governance is intrinsic to data quality and matching to the actual business use case, trying to do it later rarely leads to good results.
When asked about selling the return on investment of data governance work, Lauren admitted that it’s often quite nebulous but data governance is so key that people know they need it despite not being super clear on the specific value of the work. And you can roll out your data governance tech, policies, and processes at a reasonable pace, creating some definitions and a sandbox to show people how it will work. She is really big on the idea of a sandbox to get people used to new governance practices and tech. It isn’t as though everything changes suddenly, it’s that you’re working towards better data practices that will drive value for the organization. Fail fast is “the essence of innovation in tech” so we need to embrace it far more – but still safely and sanely – in data.
“We also can’t afford for leaders of any department to not know what quality data looks like for their teams, because their success, the success of their teams depends on having quality data that their customers trust,” Lauren said. So we all need to be in this together and have domains really owning and understanding their data. That can’t be on a central data team.
Gamification is one thing Lauren is seeing work for improving data literacy/fluency. It is a great pathway to creating a data-driven culture. Make it fun and give out rewards 🙂
Lauren wrapped on a simple message. Automate your standards. It is easy to have your tech and standards/processes quickly lose alignment if you aren’t making things easy for people via automation.
Learn more about Data Mesh Understanding: https://datameshunderstanding.com/about
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf