#220 Building Your Early Mesh Data Platform and Data Product Capabilities – Interview w/ Manisha Jain

Sign up for Data Mesh Understanding’s free roundtable and introduction programs here: https://landing.datameshunderstanding.com/

Please Rate and Review us on your podcast app of choice!

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

Episode list and links to all available episode transcripts (most interviews from #32 on) here

Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.

Manisha’s LinkedIn: https://www.linkedin.com/in/evermanisha/

‘A streamlined developer experience in Data Mesh’ articles by Manisha:

Part 1 – Platform: https://www.thoughtworks.com/insights/blog/data-strategy/dev-experience-data-mesh-platform

Part 2 – Product: https://www.thoughtworks.com/insights/blog/data-strategy/dev-experience-data-mesh-product

Article on Lean Value Tree: https://rolandbutler.medium.com/what-is-the-lean-value-tree-e90d06328f09

Blog post on mentioned data mesh workshops: https://martinfowler.com/articles/data-mesh-accelerate-workshop.html

In this episode, Scott interviewed Manisha Jain, Data Engineer at Thoughtworks.

Some key takeaways/thoughts from Manisha’s point of view:

  1. Manisha’s top 3 pieces of early mesh journey advice: A) put together a specification of what a data product is, make it clear. Your definition will evolve/improve but if people don’t understand the building blocks, it’s going to be hard to build value. B) start to create standardized input and output ports because that is how data products – your units of value – actually exchange value. C) make it easy to discover your important SLOs to create trust; partner with consumers to serve what they need.
  2. When working with your first domain, it’s crucially important to make sure they have strong data engineering talent – whether that is existing people or someone embedded into the team. That way, someone inside the team building the data product can better interact with the platform team to communicate basic needs at the infrastructure level. This isn’t as necessary for later domains – the platform already exists.
  3. Both the platform and product teams need to really understand and align on responsibilities and necessary capabilities. That will help you streamline your developer experience, which is crucial to scaling data mesh.
  4. People very often confuse data and data product. You should look to get crisp on what a data product means internally. Scott note: we still don’t have a simple, on-paper explanation of a data product you can put in front of a team of non-data folks…
  5. It’s crucial to understand how data mesh can align with how your organization thinks and works. That will make it far easier to drive buy-in. “…only when they’re comfortable with that concept … will [it] make sense to go ahead and explore more.”
  6. The platform team needs to focus on delivering capabilities to domains, not technologies. But they also need to think about mesh-level capabilities, e.g. supervision capabilities. Think about what capabilities are needed when, don’t boil the ocean. Maybe certain use cases are too difficult to tackle right now. Scott note: that is the mark of a good org approach, when you can say “we aren’t ready yet” and it’s okay
  7. It’s important to remember that the data platform is there to make it easier to create, deploy, and manage/evolve data products. Your data products are the unit of value exchange in data mesh. Use that as a guide when deciding what the platform should offer.
  8. ?Controversial?: Boundaries around data products and data product teams are more important than most realize. We really do need teams to be able to act independently and not deal with the hassles of shared infrastructure. Scott note: I know people get worried about cost but time to market of information matters too as well as time spent dealing with untying infrastructure knots by the platform team.
  9. Consider doing a series of workshops with a small group closely aligned to each domain to drive understanding and alignment. Don’t try to do all the domains at once, each has a different set of needs and capabilities.
  10. The Lean Value Tree is an effective method to break down what you are trying to accomplish into actionable pieces of work with explicit assumptions. You state the bets you are making, the hypotheses you are testing, etc. so people are on the same page.
  11. When you start working with a domain in data mesh, really drive down to specifics. Don’t just identify a use case and maybe the necessary data products. What skills and tooling are necessary to create and maintain those data products? What would a team look like to own those data products? How are you going to put that all together?
  12. Doing data mesh, every new domain is different so the onboarding plan for each incremental domain will have to be adjusted. That doesn’t mean everything starts from scratch but you need to assess gaps in each domain’s capabilities to figure out how best to enable them.
  13. To understand what capabilities domains need, the platform team needs to have strong communication and partner with the domain teams to build what they need, what helps them get the job done, not build the coolest platform :)
  14. The platform team should focus on addressing three questions in the initial build: a) how do users create value? b) how can we ensure users trust the data? And c) how do you make data products usable and discoverable?
  15. Your platform team probably won’t recognize what all will be reusable components until they’ve brought multiple domains on to the platform. And that’s okay.
  16. ?Controversial?: Most aspects of concepting and then building/deploying a data product are reusable, the data modeling and data transformations are the things that are unique to every data product.
  17. Interoperability standards won’t happen magically but you can do more custom mapping between data products very early in your journey to get use cases out. Start to look for places to create simple standards used to integrate data products.
  18. ?Controversial?: Offering automated modeling or sample data models is a double-edged sword. They can be helpful to get teams to something passable but it’s rarely going to create a good data product without more work anyway. Basically, at best it’s an outline but shouldn’t be published like it’s the article.
  19. Look to build to the thinnest slice that delivers good value. Don’t get ahead of yourself. But don’t try to get by ignoring one of the data mesh principles.
  20. As you build data products in a domain, you will learn more and more about that domain which will lead to enhancing/evolving the data products you do have and potentially creating new ones as use cases emerge. It’s important to stay curious and be prepared to share new insights via data.
  21. ?Controversial?: Data people have to really learn to speak in the language of the business. Otherwise, we will continue to talk past each other instead of learning and iterating together towards value.

Manisha started the conversation with her thoughts on how to get going with data mesh, on-boarding any domain but especially your first domain. Work with a small team aligned with that domain to find how data mesh can align with how the organization works and thinks – this will be different for every organization. That alignment is crucial to getting people comfortable and driving buy-in. People have to be comfortable with how it will work and what are their responsibilities. As Manisha said, “…only when they’re comfortable with that concept … will [it] make sense to go ahead and explore more.”

According to Manisha, when you are bringing teams up to speed, it’s really crucial to get on the same page on what you mean and what you expect from them. They often confuse data and data product for example. The differences can be subtle but are important to understand. As Chris Haas also stated in his episode, they are using the Lean Value Tree method to break down target outcomes into explicit assumptions and more manageable aspects of work. What are the bets you want to make and what are the hypotheses you are testing?

Your initial workshop(s) with a domain can also be a lesson in how to deliver value using a data mesh approach and prioritization. Manisha talked about how when working with a domain, you might identify multiple potential use cases. But you need to choose what is a priority to do now and why. This can surface what are the top one or two use cases and also show the domain how to prioritize as use cases continue to emerge in the future. The use case(s) they select to prioritize then directly lead to discovering the data products that needed to support the use case. And then you identify what skills and tooling are needed to actually execute and build and then maintain the necessary data products. Then, you can start to back into what a team working on the necessary data products (and potentially platform) look like. You can use that Lean Value Tree concept to really get specific because far too often in data work, things are left too vague. Scott note: Get specific, get explicit, chase away vagueness – but of course leave LOTS of room for experimentation and iteration as you learn and build.

When asked more about workshop dynamics, Manisha shared how they try to keep them from being too heavy on the domain – get a few people, maybe 2-4, who really understand the domain and can represent the business aspects, not just the data and/or technical aspects. Each workshop has its own goal as an outcome but it’s important to first align data mesh to organizational goals, the business strategy. Then you can get into data mesh specifics. They call their workshops 1) accelerate, 2) discovery, and 3) inception.

Manisha shared some crucial dynamics when working with your first domain that do get easier as you bring on additional domains. In the first domain, it’s crucial to really narrow in on understanding and definitions including roles and responsibilities. Data product owner is a new role, what does it actually mean? And there’s the initial platform work too. But as you bring on your third, fourth, fifth, etc. domain, there is internal learning to share with the new domains. There is more clarity around what a data product is – they can even see already built data products – and roles/responsibilities. But you will need to definitely do a gap analysis to figure out how to best enable each domain as each domain is unique. So there is a balance – look to maximize reuse of platform, processes, organizational changes, etc. but don’t look to force new domains to adhere to exactly how previous domains went through the journey.

For Manisha, it’s very important for the platform team to think in terms of capabilities. Deliver capabilities, not technology, to the domains. Work with early data product teams closely and focus on what they are trying to do instead of how you want to solve the technical aspects. Focus on specifically what are they trying to achieve? Also, the platform team needs to consider what mesh-level capabilities are necessary when. Don’t try to deliver a complete platform at the start – your platform is a product and minimum viable product, make sure you understand what minimum means and don’t go overboard.

The platform team can focus on a few simple things to drive to a good initial outcome/partnership with domains in Manisha’s view: 1) how does the work create business value? What do the domains need to do to actually drive value? 2) How will users trust data, what does trust mean and what’s needed? 3) How do we make it possible for domains to create and manage a data product that is usable and discoverable? By focusing on the task at hand and then mapping to capabilities to support that task, you can prioritize and deliver something useful and valuable without boiling the ocean. You don’t need to try to include every capability at the start, that is a bad anti-pattern. Get close to the use case and find friction. You will also learn to recognize reusable components of the platform but some reusable components might not be evident at the start.

Manisha then went further into finding and identifying reusable components. The things that are most unique to each data product are the data modeling and data transformation in her experience. Almost every other aspect of spec-ing out and building a data product are reusable, merely customized to the data product itself. Finding the necessary SLAs and SLOs by working with consumers, that is a reusable process. How your SLAs are actually measured, the definitions around those SLAs are reusable. The infrastructure and CI/CD is reusable. The overall data product blueprints are reusable. So look to make these reliable as your organization learns how to build data products to make for easy reuse.

On data modeling and interoperability, Manisha shared that it’s crucial to let domains evolve how they model their data as they learn. And interoperability, especially to support a use case, is of course important; but you will likely see a need for interoperability standards emerge when it’s needed – basically, don’t try to build all your standards ahead of time. That might be creating an enterprise data model with a different name :)

When asked specifically about sample data models and automated data modeling tooling, Manisha pointed to them being a double-edged sword. While they can be helpful, most (all?) data products need more custom data modeling to maximize their value. Essentially, the tools can get to a decent initial data model but domains should look to improve them. If Platform teams offer automated modeling tools, they should make sure there is a big caveat to their usage .

Manisha recommends you make sure your initial domain has strong enough data talent – whether existing or embedded – to communicate the basic needs to the platform team. Regular developers are often not going to be data fluent enough at the start to drive to exact data infrastructure needs like a data engineer could. But be careful not to over index towards tech too. Every domain will need people skilled in creating value through data modeling but you probably won’t need people as advanced in data infrastructure later – the platform is already built by that point :D

It’s important to differentiate what the platform should offer and what the data product developers should handle according to Manisha. The platform, at least the aspects around data product creation, should be focused on making it quicker, easier, and more reliable to create, deploy, maintain, and evolve data products. It sounds easy but it’s actually easy to lose focus on that. Look for friction points in the creation and management lifecycle and automate what doesn’t add incremental value. E.g. a data product developer shouldn’t have to manually add data to the catalog so look to automate it – and yes, not everything should be built upfront :) Scott note: she added some good flavor around data product boundaries but it’s very hard to summarize

Within the platform, Manisha believes it’s very important to maintain team boundaries because shared resources become a bottleneck and pretty quickly can become very hard to manage. This is why Zhamak has been so clear on the data product as an independently deployable unit of architecture. Manisha gave the example of even the namespace for data products in the data catalog should be reserved for that one team so teams have a dedicated space to put all their data products.

Manisha gave some early mesh journey advice:

1) back to data product specification, you should create something that gives teams a very clear idea of what a data product is and encompasses. Scott note: still waiting for someone to open source their data product creation template…

2) if, as Zhamak says, data products are our unit of value exchange in data mesh, then making it easier to exchange value is crucial. Start to create standardized input and output ports so you can easily ingest and serve data. ETL shouldn’t be a concept, it’s ingest or serving only.

3) really focus on making it easy to discover and then implement SLOs and SLAs. Being able to understand and trust data is crucial to being willing to rely on it. That trust comes from good communication around SLAs.

Manisha believes learning the language of the business is crucial for data people. You need to extract the actual business value drivers and build to those so you have to be talking the same language – unfortunately for data people, the language that aligns to business value is usually the business language :) Look to ask more business user-focused questions than trying to get technical.

Quick Tidbits:

“… the data product spec should [at a] minimum talk about the data set ports, domain, service level agreements, how do I share my data, what does data sharing look like…” – Make your data product specification easy to understand what someone will create and what a consumer will receive.

Again, focus on a streamlined developer experience that keys in on autonomy. That’s the way to a scalable data mesh implementation at least on the platform side.

There is a responsibility on both the platform and the product teams to understand responsibilities and collaborate to drive to that streamlined experience, lowering the bar to creating data products.

Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Leave a Reply

Your email address will not be published. Required fields are marked *