#227 Panel: Creating a Data Mesh Platform (1st Iteration) – Led by Paolo Platter w/ Manisha Jain, Jean-Georges Perrin (JGP), and Max Schultze

Sign up for Data Mesh Understanding’s free roundtable and introduction programs here: https://landing.datameshunderstanding.com/

Please Rate and Review us on your podcast app of choice!

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

Episode list and links to all available episode transcripts here.

Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.

Paolo’s LinkedIn: https://www.linkedin.com/in/paoloplatter/

Paolo’s Medium (multiple data mesh articles): https://medium.com/@p-platter

Agile Lab’s website: https://www.agilelab.it/

Manisha’s LinkedIn: https://www.linkedin.com/in/evermanisha/

‘A streamlined developer experience in Data Mesh’ blog post by Manisha: https://www.thoughtworks.com/insights/blog/data-strategy/dev-experience-data-mesh-platform

‘A streamlined developer experience in Data Mesh (Pt. two)’ blog post by Manisha: https://www.thoughtworks.com/insights/blog/data-strategy/dev-experience-data-mesh-product

‘Data Mesh Accelerate Workshop’ blog post by Thoughtworks: https://martinfowler.com/articles/data-mesh-accelerate-workshop.html

Max’s LinkedIn: https://www.linkedin.com/in/max-schultze/

Max’s Data Mesh Learning meetup presentation: https://www.youtube.com/watch?v=QwtTdP2wKFo

(he has many more on YouTube! https://www.youtube.com/results?search_query=max+schultze+data+mesh)

Data Mesh in Practice ebook he co-authored (Starburst info gated): https://www.starburst.io/info/data-mesh-in-practice-ebook/

JGP’s LinkedIn: https://www.linkedin.com/in/jgperrin/

JGP’s ‘Data Mesh for All Ages’ book: https://jgp.ai/2023/01/20/data-mesh-for-all-ages/

JGP’s website (lots of data mesh content): https://jgp.ai/

JGP’s Blog Post ‘The next generation of Data Platforms is the Data Mesh’: https://medium.com/paypal-tech/the-next-generation-of-data-platforms-is-the-data-mesh-b7df4b825522

In this episode, guest host Paolo Platter, CTO and Co-Founder of Agile Lab (guest of episode #3) facilitated a discussion with Manisha Jain, Data Engineer at Thoughtworks (guest of episode #220), Jean George Perrin (AKA JGP), Intelligence Platform Lead at PayPal (guest of episode #130), and Max Schultze, Associate Director of Data Engineering at HelloFresh (guest of episode #21). As per usual, all guests were only reflecting their own views.

Scott note: I wanted to share my takeaways rather than trying to reflect the nuance of the panelists’ views individually.

Scott’s Top Takeaways:

  1. As with every aspect of data mesh, you will need to focus your data platform work on what best drives value – doing so incrementally and both in the short and long-term. Look to start at common friction points in your org’s data work, not just data mesh work, e.g. automated provisioning. That will give users a good first experience that is incrementally better to existing data work, potentially driving buy-in.
  2. It’s _really_ crucial to understand user personas and necessary capabilities for each persona. Who is using it, why are they using it, and what are their points of friction? That experience plane will become incredibly important as you move along but it probably shouldn’t be your initial focus.
  3. The word platform, especially in data mesh, can be thought of as a plural. Don’t focus on only creating a ‘single platform to rule them all’, focus on delivering scalable capabilities that reduce friction and support scalability and reliability/trustability of data work by producers. Users don’t care how it fits, create a system that can evolve and scale!
  4. You must treat your platform as a product itself – think about how well it satisfies user needs, how good is the user experience, etc. But that also means your platform has a lifecycle where you add and eventually prune features. From the start, apply product thinking to your platform πŸ™‚
  5. Prioritization of platform capabilities will probably always be a challenge once you have more than a few domains using the platform. Balance being reactive and proactive with the need to generalize solutions to fit many use cases. Again, look to product management for advice on how to manage your roadmap and priorities.
  6. REALLY think about how to handle breaking changes. They are likely inevitable and that’s okay but you need to focus strongly on communication and limiting the impact / providing a gentle migration path. Don’t break without really needing to and do so sparingly.
  7. Data integration, especially in a highly regulated industry, is going to be a challenge you will face relatively early on. It’s likely not going to be easy, be prepared for that.
  8. You need to understand where there is friction in the data product creation/management process – that’s where your platform should focus. That might feel obvious but it’s not how data platforms have been built – basically, look to automate unnecessary friction first, even if that means focusing on things like templates and blueprints instead of cool tech.

Other Important Takeaways (many touch on similar points from different aspects):

  1. There are many ways to potentially get started – probably too many to list. But think about getting to early necessary capabilities that deliver value. It’s easy to get bogged down in technical aspects – instead start specifically by asking what creates value quickly.
  2. If you aren’t ready to build your platform incrementally, you probably aren’t ready to do data mesh. You need to be comfortable with demonstrating value and building as you learn and as your needs progress.
  3. When you are early in your journey, discoverability/usability is a characteristic many might overlook – you are building data products to support a use case. But if you want to drive buy-in and get incremental users of those data products from additional domains, discoverability might be a very important early capability. What is your adoption leverage point?
  4. There’s an interesting balance between MVP for the platform and easily enabling MVPs for data products. It’s hard to say exactly where the line falls. Scott note: Glovo said in episode #139 that they wish they had focused a bit more on making it easier to launch and initially manage data products over other more advanced capabilities.
  5. Consider when to start saying you actually have a platform – that can be a bit of a political statement. Potentially wait until you have started to build out the data experience plane for data producers. If there isn’t a tangible way to interact, the platform may exist but users don’t really know what it is and what it enables them to do. But the experience plane shouldn’t be among the first capabilities you build either. πŸ™‚
  6. Your data platform work needs to focus on capabilities and enabling value delivery. But that value delivery needs to be visible to business users. Basically, find your value leverage points that are visible to users and focus on satisfying those when you can to drive buy-in that you’re delivering value. It can be a political game unfortunately.
  7. You can’t treat domains as if they are the same or, especially early, often even similar. There will be different value drivers/needs and often very different capabilities so you have to make sure your platform has the necessary capabilities to drive value for that domain. And if it doesn’t yet or won’t in the near future, it might not be time to partner with that domain yet – and that’s okay!
  8. At the start of your journey, you will probably be focused on building capabilities to serve specific needs – of course, try to build those in generalized ways to serve many needs but still. As you evolve and grow the platform, you need to focus more on your overall suite of capabilities and look for where you have gaps. Don’t only wait for capabilities requests – but also don’t just build cool things because they are fun to build πŸ™‚
  9. Start to build out data product specifications based on early data products. Those will provide a much easier path for later data products, whether built by new domains or existing domains using the platform.
  10. Charging models and who should own what costs are going to play a bigger role than you’d probably like. It’s going to be a challenge you need to address at the platform level even while (presumably) others decide who should actually own the costs.
  11. Do not focus too much on building capabilities to _launch_ a data product. The launch is just a single day in the life of a product. You need to build capabilities that also help domains maintain and evolve their data products.
  12. Should the bounded context of a data product drive all the way down to the resourcing? Basically should there be zero shared resources between data products. I know this one is a bit controversial as it’s a key aspect of Zhamak’s vision but people worry about costs. If you believe in that complete separation, how do we make sure there aren’t overlaps on computational resources and storage? It won’t be super easy because it’s not how most data services have been built historically.
  13. Many data engineers will still be thinking waterfall. Be prepared. Agile can work very well if it’s done well – but it’s not done all that well in data all that often… SAFe seems to be less hated in doing Agile for data – it’s pretty widely panned in software – but it _might_ provide a bridge to doing Agile in data. Something to investigate but definitely not advice/guidance.
  14. Paolo essentially said (heavily paraphrased/edited for flow) ‘We really need to pay attention, to optimize the ratio between the value that we bring with new features against the satisfaction of various users. This drives prioritization across phases, whether the focus of that phase is on adoption, productivity, satisfaction, etc. What is the outcome we want to achieve in each phase? Then we use that to prioritize features.’
  15. Every mesh platform build out will be different, especially depending on what capabilities are most valued in your organization. For some, that might be security/privacy governance re regulators. For others, it might be usability. Find you value leverage points and also find what the people with the purse strings will find the most valuable.
  16. Measuring the return on investment, even just the success, of a data mesh platform will be tough but it’s important to start early and _start_ to get your arms around it. Productivity gains are a pretty easy measurement to consider as you get more sophisticated. Look to measurements of return on investment in software to get some ideas but be comfortable that things will always be a bit squishy.
  17. If you are building your platform as a product, products focus far more on user experience (UX) than most aspects of data infrastructure have historically. Make sure you really understand the value drivers – if users don’t adopt your platform, nothing else you do with it really matters πŸ™‚
  18. Producers being willing to own their data – and then understanding what that ownership really means – will probably be one of your biggest challenges. Hard to tackle that at the platform level but look to support those efforts as best as you can.
  19. Data producers and consumers still don’t generally talk to each other in many organizations. Look to potentially put more mechanisms in to foster communication. Scott note: see my mesh musing – episode #188 – where I believe all consumers MUST register their use case with producers.
  20. You will have to decide things like do you isolate compute and storage resources at the domain or data product level. Data producers don’t care, make that invisible to them. They key aspect to focus on is to enable and not block – basically, give domains the agency to get their work done but find ways to reduce their friction and obfuscate decisions like that from them. They are important decisions but irrelevant to users.

Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Leave a Reply

Your email address will not be published. Required fields are marked *