Data Mesh Radio Patreon – get access to interviews well before they are released
Episode list and links to all available episode transcripts (most interviews from #32 on) here
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Get in touch with Simon and Sunny: firstname.lastname@example.org
A webinar (info gated) Sunny and Simon did on data mesh: https://events.esynergy.co.uk/data-mesh-experimentation-to-industrialisation-on-demand
Simon’s LinkedIn: https://www.linkedin.com/in/simon-massey-82718a3/
Sunny’s LinkedIn: https://www.linkedin.com/in/sunnysjaisinghani/
In this episode, Scott interviewed Sunny Jaisinghani and Simon Massey who are both Principal Consultants at the consulting company esynergy. They have been involved in multiple data mesh implementations including at a large bank. This episode could also have been titled: Aligning Incentives, Reducing Friction, and Continuous Improvement/Value Delivery but it doesn’t roll off the tongue very well.
From here forward in this write-up, S&S will refer to Simon and Sunny rather than trying to specifically call out who said which part as that leads to confusion.
Some key takeaways/thoughts from S&S’s points of view:
- We are all still early in our learnings about how to do data mesh well. There is still a ton left to learn. Which is why people should share what they are learning more broadly. Helping others will help you.
- Data mesh, whether it’s your overall implementation, your platform, your data products, your ways of working, etc. is all about evolution, incremental improvement, iteration, etc. You don’t have to get it perfect upfront to drive immense value in the long run. Acclimatize people to iteration and lower the pain of change.
- Don’t go for the big bang approach, find ways to continuously deliver incremental value. That builds the momentum necessary to drive data mesh broad in your organization.
- “Data mesh is 25% technology and 75% ways of working.”
- Once you start to get into a groove with the organizational ways of working, that’s when the value force multiplier in doing data mesh starts to take off. Fast time to market with new data products, quick iteration cycles, low-friction cross-domain collaboration, etc. But it takes time to really figure out how to do data mesh in your organization.
- Your data mesh will inherently have a huge scope. Try to keep that scope as limited as possible as you are getting moving – especially the near term. It is very easy to try to “feature stuff” your data mesh implementation, especially the platform and doubly especially early. Keep complexity out where possible and focus on moving the ball forward, not spending a lot of effort preparing to move the ball forward.
- If you can’t show value from your early data mesh implementation work in 1-2 quarters, you are quite likely to lose any momentum and thus at least a portion of your funding. Find ways to deliver continuous incremental value.
- For less “data mature” domains, you can use the allure of working with more cutting edge technologies – and upgrading skillsets – to drive buy-in, get software engineers interested in participating.
- For driving buy-in with moderately data mature teams, ask them how quick iteration on data and reliable data would improve their day-to-day lives. Is there a good use case for your domain you can’t handle right now?
- For the high data maturity/capability domains, they might be reluctant to participate because they like to build things themselves. But data mesh means far less red tape/friction for data consumers, so they will likely pressure the domains to move to data mesh to reduce the friction.
- In S&S’s experience, the most technologically sophisticated domains wanted to do things themselves and were often among the last domains to participate in the data mesh. Don’t be shocked if this happens in your organization.
- Lastly on buy-in, seeing is believing. Many domains will be skeptical. Find those willing and work together to deliver great value. Many of those initial skeptics will now want to participate.
- Cloud economics and scale are really crucial to achieving value from data mesh. The cost of failure and the scale of failure are much smaller so we can iterate quickly and take more risks. On-prem data mesh is probably not worth the effort in most cases.
- Similarly, data mesh means pursuing high chance of failure but high reward opportunities around data is much cheaper/easier. Teams don’t have to be nearly as worried about an analytical data product being a failure – it wasn’t months of time and huge cost, it was a few weeks and not expensive.
- Think of your mesh data product lifecycle from the start. In alpha mode, you are creating something only for your own domain; it’s okay to make drastic changes and if you don’t develop it further if there isn’t value. Beta is more you bring on one external user and are still early in figuring your specific data product out. Don’t push a beta or ESPECIALLY an alpha analytical data product out to the broader mesh.
- In the same vein, you can treat your data as a product but not arrange it into actual products. Data consumed within the producing domain only doesn’t need the same affordances. Don’t overcomplicate things.
- Handoffs between teams, especially where context is crucial, are massive friction points / bottlenecks. Have the producers and consumers directly work together, don’t have highly specialized teams like a dedicated ETL team.
- Use the phrase “analytical data product” or equivalent. If you only say “data product”, people often think of other types of data products and aren’t as focused on delivering something for your data mesh.
- If you can really get away with a skunkworks data mesh proof of value, S&S recommend it – Scott is more cautious*. But for most organizations, it will be a top-down mandate, which will make finding willing participants slightly harder.
- Focus on getting everything out of the way to make technology the easy part. The technology aspect is hard but it’s the easiest part of data mesh. No one said data mesh is for the faint of heart or weak of will.
- Big list of antipatterns below as well.
*Note: Scott doesn’t recommend a skunkworks approach because there are likely extremely few organizations where you could get far enough to prove value without explicit funding
S&S started off the conversation with a very key point that should be said more often in data mesh: we are all still learning how to do data mesh well and where the sharp edges are, even those who have been doing general distributed data and/or data mesh for multiple years. As Scott said, everyone feels a bit like they are behind the curve but no one is the expert on this yet, it’s too early.
Trying to lead a data mesh implementation – or even just thinking about data mesh – with a technology-first approach is big mistake many people make per S&S. Data mesh should be close to 75% the ways of working and 25% technology if you are doing it well. If you don’t get to a good place with the organizational aspects, those ways of working, your data mesh implementation won’t work. Once you start to really find your ways of working, that is when the value force multiplier of mesh really kicks in. But it takes hard work to get there. And, of course, old habits die hard. Change is painful and it will be difficult to change the ways of working.
S&S have helped lead multiple data mesh implementations and the initial getting going is always unique to the organization. In their first implementation, they had the budget and freedom to drive innovation enough to find the early adopters, then prove out the value; they were able to go to senior management with proof points when asking for funding to go broader. Most organizations, people won’t have that luxury and it will need to be more of a top-down mandate with an executive sponsor. But that might make it harder to find use cases because people see it as a mandate instead of a collaboration at first. And you are likely to lose momentum and at least some funding if you aren’t able to show value in 1-2 quarters so get to delivering continuous incremental value early instead of a big bang, back-end loaded approach.
On finding the “coalition of the willing”, those domains willing to be the early adopter/guinea pigs, S&S recommend looking for people who can understand that having better, more reliable, more high-agility data practices will make their life better. Those tend to be the teams that are semi-mature with data. For the teams that are less mature with data, you can win them over with the allure, especially for those using more legacy technologies, of upskilling and getting to use some cutting edge technologies. On both of those types of teams, S&S have seen people get excited by the rate of iteration possible with data. For the data mature teams, instead of building everything themselves and having lots of checks by governance, legal, etc. to ensure they met every bit of compliance, data mesh means much less red tape – it removes lots of annoying friction for them and they can focus on driving value.
S&S recommend in general, in driving buy-in at the domain level, look for valuable use cases that are within the domain itself. That way, you can do much more trial and error because the producer and consumer are one-in-the-same. It also means they will be more incentivized to make it worthy of a mesh data product. And don’t push out data products for broad consumption too early – after an alpha stage where it is only consumed within the producing domain, for the beta stage, start with a single use case outside the domain. Once you have the hang of that, think about offering broader access to more consumers serving more use cases but don’t push data products out too early.
Another incentivization mistake many make – especially the exec sponsors – is focusing too much on the end-state big picture, per S&S. So the CDO or CIO saying “when we have all of our data organized into these beautiful data products with amazing self-serve, think about all the value. Domains, make that happen.” But that misses the point of what about now? Why are domains incentivized in the meantime? If all the value is going to be in 2, 3, 5 years, why will they be excited to participate now?
So, according to S&S, you need to focus on quickly unlocking business value with each analytical data product as well as with improvements to the platform or other aspects of your implementation. And constantly be iterating towards delivering continuous incremental business value – always be improving what you’ve got when the work has a good return on investment. Also, believing is seeing. When other domains start to get a positive spotlight from their data-informed results, more domains will want to participate.
Back to the data product creation process, S&S emphasized how important “continuous improvement” and iteration to value are in data mesh. You don’t have to get it perfect out of the gate. A huge benefit of data mesh is the ability to add – and adjust – incrementally as you learn more to drive more value. Consumers also have to be on board that things may change but it’s far better. As stated, the analytical data product creation process S&S recommend is to start with an internal-to-the-domain use case. That way, it doesn’t need to be excessively governed or even documented that well. The producer is the owner so they know the needs and the use case. Once you’ve moved past that alpha stage, don’t look to push the data product out wide, find a single use case from another domain and work with them to iterate. Then you can look to publish it to the wider mesh once you really know what you have and what you want to share. And always look to improve and manage your data product like an actual product. “If you don’t do the product lifecycle management, it’s just a data mart.” Data marts didn’t go that well when we tried them in the 80s for a lot of those reasons.
Incremental improvement and iteration aren’t just for your analytical data products according to S&S. Historically, our data setups have been sort of “all or nothing”. You either got it right and it created a lot of value or you didn’t and it was a flop. The ability to – and the cost of – change in data was very high. Monetary cost, delayed time to insights cost, etc. But with data mesh, it is inherently about iterating, evolving, making improvements, building incrementally, learning and changing, etc.
On some anti-patterns, S&S pointed to people trying to do far too much upfront before they start to deliver value. No battle plan survives contact with the enemy. The inherently messy nature of data is our enemy – you will learn far more after you get moving forward than trying to plan and build everything ahead. Set your North Star and start traveling. Focus on CYA – cover your…butt… – governance. Keep out the layers of complexity – no feature stuffing. It’s incredibly easy to overcomplicate.
Other anti-patterns S&S discussed were: 1) Not ensuring your technology and business sides of every use case are collaborating. Otherwise, you deliver the cool shiny thing that the business side didn’t want or need. That’s wasted effort. Spend the time to communicate better to prevent that. 2) Going for width of data on the mesh instead of making sure it is ready for consumption. Don’t put data on the mesh if there isn’t a specific use case for that data. 3) Having a messy data strategy. Becoming “data-driven” isn’t a strategy. 4) Moving forward without the organizational maturity to really succeed. 5) As stated earlier, focusing on the technology to the detriment of the organizational aspects. 6) Calling everything an analytical data product. If it’s a report or an insight, S&S don’t think that should be called a data product. It muddies the water too much. 7) Not adding a clarifier to data product to make it clear exactly what an analytical or mesh data product is. The phrase “data product” is more often misunderstood than understood. 8) Playing on number seven, trying to mix in operational concerns – live transactions – into analytical data products. You will optimize for operational concerns and analytics will become a second class concern. Making analytics a first class concern is pretty crucial to effectively doing data mesh. 9) Chaining too many data products off each other. Downstream of downstream of downstream data products can become a real issue. Try to push use cases upstream where possible.
Per S&S, data work has had a high cost – it’s been very difficult to get everything right historically and on-prem hardware purchase cycles certainly didn’t help. You wanted to make sure something will have a big benefit before looking to invest in it. So there was much less innovation – just going after the low hanging fruit where the chance of success was high. And yet, many projects still failed. However, in data mesh, we can lower the incremental cost to experimentation and new data production very significantly – so you can actually test out more experimental, high chance of failure but high reward use cases. And it means you have the fast time to market capabilities to go after so many analytics opportunities we couldn’t in the past.
Moving to the cloud means we can fail quicker and cheaper according to S&S. Data mesh is going to be extremely hard and expensive and likely not worth it if you try to do it on-prem in their view. And we iterate from failure to success. Historically, as mentioned, the cost of failure in data initiatives was high – monetary, time, etc. Now, if we fail, we can fail in much smaller ways much more quickly – each analytical data product has a much smaller investment in it than most historical data initiatives. That also means faster feedback, iteration, improvement, etc. So much smaller blast radius and cost of failures means we can be more experimental.
S&S referenced one of Zhamak’s old figures from early in sharing about data mesh where there are three separate groups, the data producers, the central data team, and the data consumers. The producers have smiley faces, the consumers have a neutral face, and the central data team are all frowning faces. That’s because of far too many handoffs between teams – that central team has ownership without context and are a bottleneck/overworked. Jesse Anderson mentioned going outside your team takes 12x as long for work to get done. So there is a big inertia – because it is how many orgs have worked – around trying to have teams specialize in things like ETL or the consumer self-serve platform. But for your own sanity, stay away from handoffs where possible around capabilities.
Scott asked about ownership of derived data products and S&S, as referenced earlier, emphasized that many analytical outputs should not be called analytical data products in data mesh. They delved into the concept of treating your data as a product but not necessarily organizing it into actual products. The differentiation was capital p “Product” is an actual analytical data product while lowercase p “product” is simply arranging data in such a way to be easily consumed. Lowercase p products are about usage by the producing domain so it won’t have the same affordances but also won’t need the same governance. There are some companies that are forcing domains to consume only from mesh data products – Flexport being one – to make sure they are of highest quality but S&S see that as overkill.
On actual ownership around derived data products, S&S mentioned something Wannes Rosiers talked about all the way back in episode 5 – pushing transformations left. Data consumers should make sure data producers understand what data transformations they are doing from data products to see if the data product owner should actually own those transformations as part of the source data product. It’s more reliable and cost efficient and might make the source data product more valuable to other consumers.
Containing the blast radius of a mesh data product, as touched on earlier, is really important. If making a change to one data product is going to negatively impact a huge number of downstream data products, that’s typically not great – is that really loosely coupled in all senses? It means you have to think hard about should you actually promote a useful data set to being a mesh data product – what is its lifecycle?
A few other tidbits:
Get the governance team on board from day one – they can help if you have governance questions but they shouldn’t be forced to be the decision maker, that makes them a bottleneck.
Acclimatize people to iteration – constant change is a new norm. Make it not scary. It’s an opportunity to improve, not destroying what you’ve already built. And it’s okay to make the missteps that require a course correction.
Data contracts are very crucial in general. If people can’t know exactly what is promised – and clearly what isn’t – it’s hard to trust data products.
With data mesh, we stop storing data “just in case” it’s valuable in the future. Data sitting around has cost and risk associated.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB