Data Mesh Radio Patreon – get access to interviews well before they are released
Episode list and links to all available episode transcripts (most interviews from #32 on) here
Provided as a free resource by DataStax AstraDB
In this episode, Scott interviewed Paul Andrew, Technical Architect at Avanade and Microsoft Data Platform MVP.
Paul started by sharing his views on the chicken and egg problem of how much do you build out your data platform and when to support your data product creation and on-going operations. Is it after you’ve built a few data products? Entirely before? And how that discussion becomes even more in a brownfield deployment that already has existing requirements, expectations, and templates.
For Paul, delivering a single data mesh data product on its own is not all that valuable – if you are going to go to the expense of implementing data mesh, you need to be able to satisfy use cases that cross domains. And the greater value is in cross-domain interoperability, getting to a data product that wasn’t possible before. And, you need to deliver the data platform alongside those first 2-3 data products, otherwise you create a very hard to support data asset, not really a data product.
When thinking about minimum viable data mesh, Paul views an approach leveraging DevOps and generally CI/CD – or Continuous Integration/Continuous Deliver – as very crucial. You need repeatability/reproducibility to really call something a data product.
In a brownfield deployment, Paul sees leveraging existing templates for security and infrastructure as code as the best path forward – supplement what you’ve already built to make it usable for your new approach. You’ve already built out your security and compliance model, make it into infrastructure as code to really reduce friction for new data products.
For Paul, being disciplined early in your data mesh journey is key. A proof of concept for data mesh is often only focused on the data set or table itself, not actually generating a data product and much less a minimum viable data mesh. It’s pretty easy to put yourself in a very bad spot because taking that from proof of concept to actual production is going to be a very hard transition and telling users it will take weeks to months to productionalize is probably not going to go well. Be disciplined to go far enough to test out a minimum viable data mesh.
Paul emphasized the need for pragmatism in most aspects when implementing a data mesh. Really think about when to take on tech debt and do so with intention. When shouldn’t we take on tech debt? And how do we pay down tech debt and when? There is a balance between getting it done and technical purity. How do we choose what features to sacrifice? What is the time-value to money aspect, or how much importance do we have on getting it done sooner rather than more completely? These are questions you’ll need to ask repeatedly.
Similar to what previous guests mentioned, Paul is working to encourage and facilitate the data product marketing and discovery process – discussing with data consumers what they want, pie in the sky thinking. Then taking that and speaking with data producers and figuring out pragmatic approaches and what is simple to deliver. Is one aspect going to be very difficult? Go to the consumers and let them know it will delay delivery and they need to fund that aspect. Do they still want it? Use that back-and-forth discussion to drive negotiations to a valuable solution with less effort. Look for that return on investment. Be pragmatic!
Paul recommends making business value your general data mesh ‘North Star’. Ask the pragmatic questions – so shift the data function from taking requests/requirements to leading those negotiations. Have the conversation of “Is this worth it? Who is going to pay for it? What is it worth to them?” As of now, Paul and team are still often functioning as the translator between data producers and data consumers.
But, when discussing the goal of getting out of the middleman/translator role, Paul pointed to a few signs that an organization is ready for producers and consumers to directly work with each other. Some aspects are general company culture, how data literate/capable are the execs, data platform maturity, etc. If you can mature your organization’s approach and skill, you can move towards not needing a data translator.
Paul talked about how to think about your data mesh journey and different elements of it, even a data product, in a crawl, walk, run fashion. Think about your data products first and foremost as serving at least one specific purpose. Still create with reuse in mind but they should have a use-case to serve and can expand from there. At a mesh level, part of crawling is getting to a few standard interfaces for data products to use to communicate. At the data platform level, part of crawling is getting to a place where it is possible to publish new data sets but walking might be a significant reduction in friction to data product production. While this means that a minimum viable mesh is still a pretty high bar, you can get to a place that is comfortable with being at a crawling stage with a good roadmap towards walking and running. Done and good is better than perfect and the forever “just 3 more weeks”.
Paul’s data mesh blog series: https://mrpaulandrew.com/tag/data-mesh-vs-azure/
Paul’s LinkedIn: https://www.linkedin.com/in/mrpaulandrew/
Paul’s Twitter: @mrpaulandrew / https://twitter.com/mrpaulandrew
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB