Data Mesh Radio Patreon – get access to interviews well before they are released
Episode list and links to all available episode transcripts (most interviews from #32 on) here
Provided as a free resource by DataStax AstraDB; George Trujillo’s contact info: email (firstname.lastname@example.org) and LinkedIn
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Vikas’ LinkedIn: https://www.linkedin.com/in/vksnov9/
Vikas’ Twitter: @vikaskumar9 / https://twitter.com/vikaskumar9
Vikas’ email: vikaskumar9 [at] gmail
In this episode, Scott interviewed Vikas Kumar, AVP and Head of Data, AI, and ML at CNA Insurance. To be clear, he was only representing his own views in this episode.
Some key takeaways/thoughts from Vikas’ point of view:
- In data mesh, make sure to keep focused on bringing the business domains along. You aren’t building for the sake of building. If users can’t derive value from the data work being done, why is it being done?
- The 2010s through the early 2020s have been about moving data to the cloud but we are starting to see people really leverage that data to generate value. The cloud unlocks many new possibilities around data due to flexibility, scalability, and unit economics.
- With moving to cloud, there is much less focus on specifically managing the data and more focus on getting value from the data. SaaS data product offerings really unlock people’s time to focus on driving value.
- Cloud gives us the scale and data availability but there is still a long way between having the data available and leveraging the data for significant value.
- Cloud can be a double edged sword – it gives you flexibility and scalability but without good controls, you are likely to do a lot of duplicate work. Be careful that ease of data product creation – or at least PoC creation – doesn’t create chaos and data product overlap. Make sure to have good governance here including strong communication.
- ?Controversial?: We aren’t very good yet at making it easy for business domain users to leverage data in many of their decisions. Where do we fall on the spectrum between we need to teach them how to do everything data and we need to curate everything for them?
- ?Controversial?: It’s easy to focus too much on the short-term quick wins in data. You need to think about your overall data landscape and build a foundational approach so you can go after big picture, big impact bets with your data work. You should think about building every data product from a foundational approach too to make them more extensible.
- We need to get people out of their functional silos with business people only speaking business and data people only speaking data.
- To do data mesh well, we have to focus on the operating model of the organization around creating and maintaining data products. There is too much focus on the technical aspects instead of how does this actually get done in a way that fits with the organization’s ways of working.
- Data producers must assess data consumers’ data fluency levels. If they aren’t very strong with data, should you really be delivering them raw data instead of curated insights?
- For any data product, you should start by mapping it to a target outcome. But it shouldn’t stop there because with reuse, new outcomes may emerge that drive additional value.
- Data product owners are crucial to building good data products. It’s their job to identify and then satisfy the objective of building the data product. What are you trying to achieve?
- We shouldn’t focus only on the data product – the work to create that data product is what makes it valuable, the data product is merely the vehicle for delivering the value, the output of real product work around data.
- ?Controversial?: Many companies doing data mesh appear to be trying to leave data governance until ‘later’ and that is likely to bite them. The governance meaning the security/access control but also the interoperability. You might not need to implement all of your data governance upfront but you should plan out your general governance strategy very early in a data mesh journey.
- Access control is a really hard problem. Many organizations don’t have good communication or visibility into who is using what data and especially how/why. We need to be asking these questions and then setting access policies that expire too – we should check in to see if people still need access, that’s just good governance.
According to Vikas, 2010 through the early 2020s the focus has been on moving the data to the cloud to better drive value. And now that more and more of our data is in the cloud, we are starting to see much broader adoption of things like ML and AI. The cloud gives us the promised but under-delivered scalability of the “big data” technologies along with the flexibility to move quickly and experiment. Cloud can also mean it’s easier to bring non-data people into the mix to drive better collaboration between the data people and the business people/domain. So cloud gives us this massive scale and data availability but we still have to learn to better leverage our data, drive value from it – we are still in pretty early days there as an industry.
A big outcome of the mass movement of data to the cloud is how much time is spent on data management versus getting value from the data according to Vikas. DBAs used to spend 60%+ of their time just managing the data but data people’s time is now focused on getting value and probably only 10-20% is spent managing the data specifically. But cloud can be a double-edged sword too – if it’s very easy to create new data products or beta data products, you have to be very careful to not create overlap/duplicate work/data products. It all comes down to governance and your operating processes to prevent that.
As an industry, we are getting much better at serving data reliably at scale according to Vikas but we still struggle with the gap between the data is available and the data is able to be used by consumers in the business domains. We are still working on figuring out where to meet in the middle between handing people reports and maybe dashboards – a kind of old school approach – versus upskilling them to very high data fluency so they can build everything themselves.
When asked that question – do the data people have to learn all the business context or vice versa – Vikas gave the very data mesh answer of “it depends.” But that makes sense because there shouldn’t be a single prescribed method, you have to look at how your organization works and fit with that model. And you probably want to meet somewhere around the middle. Otherwise, you will cause unnecessary friction. So look to your general ways of working, cross train people, get people exchanging context about what they are trying to achieve and instill a culture of feedback and collaboration. That’s how you can actually execute well on a data mesh strategy.
Vikas talked about your data strategy north star being about getting value from your data, reliably and at scale. So, you need to be realistic about where you are in that capability journey right now. As a data producer, you need to assess can your data consumers do everything necessary if you give them raw data or should you be curating it for them so they can actually leverage the insights. Work to find the high value return data work early instead of trying to do the most complicated aspects of data. It’s okay to start small, no shame there.
A data product should always map to a target business outcome according to Vikas. But that shouldn’t be the only factor. The reason for creating a data product should be trying to achieve that outcome so use that as the north start for the data product but we must build in a way where data products can be reused – sometimes with some additional work – for additional use cases. And it’s really crucial to have a data product owner that is discovering and focusing on the objective of the data product. How can you provide the business meaningful data that meets their objectives, that should be a key objective of every data product.
When asked how do we balance focusing on the long-term wins instead of the quick – but typically small – wins, Vikas talked about the need to create a holistic view of your data and build a very strong foundation for how you will deal with data in general. That makes it so you can jump on the quick wins when you find them but you also have a steady foundation for making much bigger bets going after long-term big wins. But with a shaky foundational layer for your data, those long-term big wins are much less likely to pay off. And that foundational aspect comes in at the data product level too – build data products that can be easily extensible when it makes sense because they are built to be extensible from the start. Kent Graziano in the recent data modeling panel railed against having to rebuild every time you extend a data product, don’t do that 🙂
For Vikas, there are many value streams for a data product – most people focus on the data set itself but it could be the governance work or the collaboration conversations between producer and consumer. We need to focus less on the data product as the exact output instead of the data product being the vehicle for delivering value but the overall product work itself significantly enhances the value of the data product.
Data governance seems to be the part of data mesh that confuses a fair number of organizations so they ignore at their significant peril according to Vikas. While you might not have to build every aspect of your governance upfront, it’s crucial to think about how you will apply governance. And to truly get to the ideal of a self-serve platform, governance needs to be a simple part of the ways of working. Saving that for later is not going to end well for many organizations. And while access control is hard, we need to get far better at understanding who is using what and _why_. How long should someone get access to data? Forever access should be a non-starter. And how do we make it easy to grant that expiring access?
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB