In this episode, Scott interviewed Tim Gasper, VP of Product at data.world and the co-host of the Catalog & Cocktails podcast.
They covered two main topics – 1) the skeptic’s view of data mesh and 2) Tim’s/the data.world team’s “ABCs of Data Products” framework.
Skeptics have a few main pushbacks on data mesh in Tim’s view. Tim listed the top 6 that he sees and then discussed them with Scott.
#1: Data mesh isn’t for every organization depending on size, number of domains, data/problem space complexity, etc. Tim said this. Zhamak has said this. Most data mesh advocates/fans say this regularly. This is one of the myths of data mesh – that it’s designed for everyone. Don’t go to a decentralized data setup if you don’t need to. Tim made the very good point that we need more conversations and better guidance on what to measure if centralization of your data team and processes is your actual challenge.
#2: Tooling doesn’t exist – yet? – to make it easy for domains to easily take over data ownership. A big conceptual myth of data mesh is that it has to solve every data problem, even the most difficult, right out of the gate. Tim mentioned that your team needs to really think about self-service being about empowerment, not necessarily a single big red easy button. And your implementation will evolve – it MUST evolve. It’s not easy yet and if your team isn’t prepared to roll up their sleeves, it’s okay to wait to implement.
#3: There shouldn’t be anyone who “owns” the data. Tim made a really good point here on accountability to sharing your data versus the “fiefdom” model – where someone has complete control over how the data is used. Yes, someone shouldn’t be able to prevent other domains from using data. But that’s not at all in the spirit of data mesh anyway. Why would you make data reusable and discoverable if people can’t use it?
#4: There aren’t enough case studies yet. Tim mentioned this briefly. It is a bit of a chicken and egg issue: if we wait for people to be “done” with their journeys, it will be another 5 years before good case studies emerge. It’s okay to need more proof before wanting to go forward but it might mean lost opportunity. And there are good examples out there, including guests from this podcast (20+ so far).
#5: Lacking guidance on exactly how to handle cross domain data combinations. Tim mentioned that there is the question of how do those combinations get managed as right now, in a data warehouse or data lake world, there are clear owners – the data team. Unfortunately for those who want a direct data mesh playbook, this is situational and you have to figure it out yourself for each situation and be ready to evolve.
#6: Data mesh will create data silos. Sure, if you have the data mart model of old where data is created only for the domains to use internally. But that’s not data mesh. Tim talked about how important iteration and collaboration is to prevent data silos. So much is about the intent to not let data silos become a problem and iterate towards interoperability.
Overall, Tim and Scott agreed that a lot of the pushbacks are probably coming from orgs where data mesh would create a lot of friction in their existing cultures and as Tim said, changing culture is very hard and “fixing” culture is even harder.
Tim talked about how we too often think about data implementations, whether macro or micro, as a singular event, something that doesn’t evolve – data implementations aren’t a house, more like a garden. Seasons change, you might have to weed a bit – or a LOT -, you might change what the focus of your garden is – are you sick of zucchini? Is this data product or report/dashboard no longer relevant?
The Data Product ABCs framework:
Tim and the team at data.world put together a framework for thinking about data products. An important aspect is that this, like much of data mesh, isn’t about providing specific answers but more the questions you must answer to get to a good outcome. A key point Tim made at the end was just how many data challenges come from implicit expectations and knowledge versus getting very explicit to make sure everyone is on the same page and that knowledge is shared and documented. Tim basically said get in the room -> negotiate -> come to a conclusion and shake hands -> document.
- A is for Accountability – Who owns the data product? And what does ownership specifically mean?
- B is for Boundaries – What is a data product? What interfaces does it use? And crucially what isn’t a data product? And also what isn’t part of a specific data product?
- C is for Contracts – What are the explicit expectations of this data product? Who can use it? What are the SLAs? Abe Gong mentioned in his episode, #65, how often these contracts at least start as implicit – let’s get communicating and negotiating folks!
- D is for Downstream – Who uses the data product, who might want to use it, and why? What is the roadmap? Etc.
- E is for Explicit Knowledge – Because ABCDK doesn’t sound as good? Don’t believe your data products are self-describing. Document things, explain in detail. What are the relationships to other data products or concepts outside the data product?
Tim’s Twitter: @TimGasper / https://twitter.com/TimGasper
Catalog & Cocktails page: https://data.world/podcasts/
Data.world blog content:
Do You Know Your Data Product ABCs? https://data.world/blog/data-product-abcs/
The Role of a Data Catalog in Data Mesh https://data.world/blog/data-catalog-data-mesh/