Data Mesh Radio Patreon – get access to interviews well before they are released
Episode list and links to all available episode transcripts (most interviews from #32 on) here
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Gunjan’s LinkedIn: https://www.linkedin.com/in/gunjanaggarwal/
Gunjan’s Medium: https://gunjan-aggarwal.medium.com/
In this episode, Scott interviewed Gunjan Aggarwal, Head, Digital Data Products and MarTech Strategy at Novartis. To be clear, she was only representing her own views on the episode.
Some key takeaways/thoughts from Gunjan’s point of view:
- Set your overall data product strategy – for when you are in stage 2, going wider with data mesh – earlier in your journey than many may think. It’s easy to focus only on use cases instead of the bigger picture.
- Make sure to align early on who owns what – what are the clear boundaries between roles. Otherwise, with the amount of change data mesh drives, there will likely be unnecessary chaos. Get specific.
- Don’t fall to the ‘Data Field of Dreams’ – “if you build it, they will come.” Focus on building to actual problem statements. Involve people early, make them accountable, give them skin in the game and they will care.
- “The more you ask why, the more clarity you will get.” Really dig in deep into the reasoning for creating new data products or ARDs (analytics ready datasets). If we have this data product, what will it unlock for us?
- It’s crucial to avoid the trap of building data products specifically to use cases. You must have the bigger picture in mind and focus on reusability instead of only solving one set of challenges. Can you extend an existing data product?
- Data people should have domain knowledge where possible. That way, they can push back on requirements that don’t make economic sense, that don’t maximize the return on investment.
- 4 part approach to designing data products: 1) find clarity on the problem statement; 2) assess who are the personas that will benefit from it; 3) dig into what you already have available; and 4) focus on serving value to the problem statement in a way the persona can use.
- ?Controversial?: scalability is more important than time to market when it comes to data products, especially as you develop a broader set of data products. Tech debt around scaling is hard to combat just as you are delivering strong value with the need for scale. And it limits additional use cases leveraging existing data products if they can’t scale.
- Look to provide as many easy paths as possible for new data products. Templates, blueprints, standard schema, a global taxonomy, etc. They don’t have to use them but they are great starting points.
- You need to be proactive in partnering with the business. Data people have historically waited for requests/requirements. That won’t lead to fast feedback loops and quickly iterating towards value.
- It’s easy to end up focusing on the single use case instead of the bigger picture. But this will likely result in business disruption from a half-baked product.
Gunjan is in phase 2 of a data mesh implementation at the moment, the going wide phase. As part of that, she’s looking at how do you create a suite of data products to serve the needs of a broader set of use cases and look at putting things in place to more easily serve more ad hoc querying. She recommends setting your data product strategy for what will be your long-term needs earlier in your journey than most might think – what is the real business strategy for your mesh as a bigger entity than just individual data products? How will they actually work together so 1 + 1 = 3?
According to Gunjan, it is very important to clearly define boundaries and responsibilities for roles. It’s easy to get confused about what is a data product owner versus a product manager, for example. Look to the RACI model for defining things clearly. If there is a lot of change and unclear responsibilities, that can cause lots of challenges and chaos. If you don’t have alignment early, it’s very easy to go wrong. So make sure you spend time before moving forward with data mesh to really focus and align on why are you doing this and who will own what. Start with the end goal in mind and march forward together.
It’s crucial to make sure you involve people early in every data product you develop in Gunjan’s experience. If you build something for them instead of building it with them, they are far less likely to buy in. Make people accountable, that their data product “is their baby”. Make them part of defining success for a data product and work with them to make sure they can scale it up when it succeeds.
For Gunjan, when considering new data products, always start with asking why. Why do we think this will drive incremental value? Why is this the right time? Dig in layer by layer to understand why is this a good use of our time and what is our expected business value from doing it. It’s easy to miss the forest for the trees. This method also makes finding reusability more likely. Why can’t you use what is already built? And if there is a good answer, make sure to build your data products so they are reusable for other use cases down the line.
On digging deeper into reusability and extensibility, when you look at new use cases, consider if you need new data products to support it or if you can modify and extend existing data products instead. It’s quite easy to build data products to try to support each use case individually but it will quickly overwhelm your teams. Look at the greater whole for how you can support your needs with a suite of data products.
Where possible, Gunjan believes it’s best if your data people have domain knowledge – they can push back on the cost/benefit of choices far better than someone without the specific knowledge. Do you actually need real-time? What is the impact of different SLAs to the return and the cost to create/maintain? It’s far easier to maximize return on investment if one person understands both sides of the equation.
Gunjan and team have a four part approach to building out new data products: 1) “Find the clarity on the problem statement”. What exactly is the use case you will be serving? 2) “Who are your personas you are serving and what will they require?” Get specific around who you are trying to serve. 3) “What is available?” Dig into what you already have and evaluate if what exists can serve the use case(s). 4) Focus on providing the value to serving the problem statement in a way the personas can benefit.
In the long run, Gunjan believes the ability to scale up is more important than speed to deploying a data product. When you rush to create data products, you will inevitably create a lot of tech debt, especially to scaling when the time comes, so focus more on ability to scale. It might not feel like that at first but it’s especially important when you have many data products. It will be very frustrating when you have additional use cases and cannot easily scale because you wanted to release your single data product 2 weeks earlier. And focus on prioritization as well. Some use cases will have to wait and that’s okay.
According to Gunjan, you need to get pretty close to your domain business partners. Embed yourself in more business meetings discussing use cases and problem statements. You shouldn’t be waiting for requests, you should be extracting that information in regular discussions. Fast feedback cycles leading to fast iteration is crucial.
There is no such thing as a future proof tech stack. So Gunjan recommends 1) accepting that and 2) preparing yourself for graceful evolution to meet needs. You should always be asking what are the risks to your platform and how can you mitigate them. Modularize your tech stack so you can easily add and/or replace when necessary.
In wrapping up, Gunjan talked about the importance of not focusing on the single use case but how it plays into the bigger picture, the long-term. It’s easy to go down that path of focusing on the single use case, especially as you start out, but it will cause disruption to the business from a half-baked product. Product thinking, not project thinking.
Easy data product templates and a good centralized catalog are very important. Having a good catalog can help you identify gaps in your data product coverage as well.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB