#178 Data Modeling in Data Mesh Panel – w/ Juha Korpela, Kent Graziano, and Veronika Durgin

Data Mesh Radio Patreon – get access to interviews well before they are released

Episode list and links to all available episode transcripts (most interviews from #32 on) here

Provided as a free resource by DataStax AstraDB; George Trujillo’s contact info: email (george.trujillo@datastax.com) and LinkedIn

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.

Juha Korpela (Chief Product Officer at Ellie Technologies) facilitated this panel on data modeling in data mesh with Veronika Durgin (Head of Data at Saks) and Kent Graziano (The Data Warrior, former Chief Technical Evangelist at Snowflake). This panel was hosted by the Data Mesh Learning Community in partnership with Data Mesh Radio.

Veronika’s Links:

Veronika’s LinkedIn: https://www.linkedin.com/in/vdurgin/

Data Vault North America User Group: https://www.meetup.com/dvnaug/

Kent’s Links:

Kent’s LinkedIn: https://www.linkedin.com/in/kentgraziano/

Kent’s Website: https://kentgraziano.com/

Kent’s Twitter: https://twitter.com/KentGraziano

Data Vault Alliance: https://datavaultalliance.com/

Juha’s Links:

Juha’s LinkedIn: https://www.linkedin.com/in/jkorpela/

Ellie Technologies’ website: https://www.ellie.ai/

This write-up is from Scott Hirleman’s point of view:

As someone without a ton of depth in the data modeling concepts, here are some of my key takeaways that should be taken with a grain of salt 🙂 I decided not to write up everyone’s opinions but more what are my takeaways:

  1. Start from the business concepts instead of the technical (a data mesh theme).
  2. Focus on enabling people to do the data modeling instead of trying to do it centralized but do have a centralized understanding.
  3. It’s crucial to not only consider data modeling at a data product level, that’s the route to data silos.
  4. Communication is the most important data modeling skill.
  5. Share early and often to get feedback and work together on that fast feedback cycle to quickly iterate.
  6. Limit your blast radius, only bite off what you can chew and limit what can go sideways early as you learn – do not cause lots of downstream damage.
  7. If you don’t stay connected to each other and communicate well, you’ll likely have MDM (master data management) style nightmares in data mesh.
  8. Centrally define the standards and rules and have a clear way to get help and settle questions/disputes.
  9. Starting data modeling from technical integration is about what can we do right now. We should focus on what we need to do to drive business value, not what is possible based on the existing solution.
  10. In data modeling, far too often people don’t look at what data consumers want. User requirements, business requirements, and technical requirements all must be met.
  11. Not everyone needs to be a data modeler but everyone should understand how information is communicated internally via a data model.
  12. Alla Hale method (episode #122): show up to every meeting with something to discuss. Don’t show up to the first meeting with it built but constantly show something to get feedback on.
  13. Shared understanding is crucial and can ONLY be reached via strong communication. Stop trying to shy away from communication.
  14. Make the implicit explicit – “what do you mean by that?” is one of your most crucial tools to doing data modeling well.

Data modeling in data mesh will probably be far more similar than dissimilar to data modeling in a more centralized world. The focus on the business concepts is crucial. Far too often we try to start from the technical instead of what are we trying to achieve and it’s crucial to not fall into that trap. Getting the technical aspects for interoperability wrong can be a pain but if things work together technically but not at the business level, that’s a lot of sound and fury signifying nothing – essentially that’s a lot of cost for work and compute that doesn’t lead to actual business value.

One thing I’ll note is every one of the guests is a Data Vault proponent. I’m not sold that it’s the right way in the long run for data mesh – I feel like we need to evolve data modeling concepts for a more distributed, federated organizational approach. But from what they said, Data Vault does sound like a very solid base to start from – start from the business concepts first and focus on what you are actually trying to accomplish. Data modeling for the sake of data modeling is not something anyone should want to do.

As with just about everything else in data mesh, data modeling should be about limiting your blast radius of potential negative impacts as you get to fast initial and incremental feedback. Get to that iteration, take on things that matter but don’t make it a big bang. Fail fast and all that 🙂 This is not about taking requirements and going off to your own world at the domain level. It’s even more crucial to have overall communication/cohesion as we enable more and more people/domains to own their data.

The most important aspect for developing lasting interoperability via data modeling is the business concepts. It’s not that hard to do the technical interoperability once you figure out how things should work together. Starting from the technical feels easier but is a recipe for losing the value of the bigger picture. Typically, technical-based implementations are less extensible because the technology decisions are embedded into the solution instead of an enabling factor. Veronika said something like “focus on the words and meanings and not the data types.” She also said that starting with technical integration focuses far too much on what can be done with the existing implementations instead of what do we need to drive value and how can we improve the existing implementations to create more value.

As with most things in data and software engineering, it’s okay to build your data model in an opinionated way but maintaining flexibility, especially as you are doing your initial development, is crucial. Major change can be quite costly – especially if you have to change the entire foundation of what you are doing. Kent railed against the need for rework. Build your data model and subsequent data products so providing another view or angle isn’t nearly as difficult and requires only the work to do that, rather than changing everything else you’ve done to also accomplish the new data view – in data mesh, this would often be a different API or a new table sharing the data for a different use/perspective. Again, look to prevent rework and ensure flexibility.

A massive concern with data mesh is data silos, the worry that if you have a bunch of domains doing their work separated and not in communication, nothing will interoperate. So you probably do need some kind of centralized group – whether that is their main role or part of their other responsibilities – helping domains do their work in the context of the greater organization. Note, that is what loose coupling from microservices means. Fully decentralized would be no connections versus things work together but can scale independently – that is decoupling. Having people who are there to help is the key to federated instead of fend-for-yourself so data architects are a crucial aspect of data modeling in data mesh.

While there is no centralized data model in data mesh – they aren’t flexible and mean you lose a ton of context from trying to force things to comply with that model – there obviously can be centralized guidance and direction, a standard set of data models, etc. Think about a well-functioning federated government – maybe not the US… There are people doing work in the centralized function but it’s about enabling those at the more local level to do things the right way. Juha quoted someone with something like “governance is not about leading people to do things right, it’s about setting them up to do things the right way”. That centralized team can’t know what’s right for specific situations – because they lack the localized context – but they can specialize in enabling doing things the right way. Kent claimed there is an enterprise data model and that can quickly go the wrong direction but if you interpret it well – that there are clear relationships across the business that are crucial to model well in your data, that are fundamental business truths you should reflect in your data – it can mean much less learning of deep domain specific context because you understand how domains fit the organization.

A number of people believe data modeling must be all about one view or perspective to rule them all. That is where data mesh fundamentally pushes back. You can have one view you agree on as an organization – such as revenue – but others should still be free to publish something that is similar in meaning but from another view. Much like in Domain Driven Design (listen to Vlad’s episode for more, #171), there should be a ‘language’ (broad definition, think interface and terminology) of the domain to maximize the context of information shared in the domain and a separate ‘language’ used to communicate to the rest of the organization. That way we can still maximize context for business value locally but communicate globally in order to also maximize global data interoperability, which is a crucial organization-wide business value driver.

Kent mentioned another worry of data mesh that is often closely aligned with the data silo worry: master data management-related nightmares. While we absolutely have to reinvent MDM for data mesh – look for a few panels on that in the near future? – it’s pretty clear it’s bad to potentially have 10 different definitions that might filter to someone who doesn’t understand the nuance and differences. Especially if that exec asks a simple-seeming business question and gets 5 different answers. Data trust is gone. So we have to be clear in tackling that problem and strongly communicating. Maybe not mastering data but mastering ways of answering typical questions?

Overall, I think you will learn a ton just like I (Scott) did 🙂

Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB

Leave a Reply

Your email address will not be published. Required fields are marked *