Data Mesh Radio Patreon – get access to interviews well before they are released
Episode list and links to all available episode transcripts (most interviews from #32 on) here
In this episode, Scott interviewed Erik Herou, Lead Engineer of the Data Platform at H&M. To be clear, Erik was only representing his own views and perspectives.
A few key thoughts/takeaways from Eric’s point of view:
- Data mesh can work well with a product-centric organization strategy as both look to put ownership and product thinking in the hands of the domains.
- To develop a good data/enablement platform for data mesh, look to work with a number of different types of teams. That way, you can see the persistent/reusable patterns and capabilities to find ways to reduce friction for future data product development/deployment.
- H&M had an existing cloud data lake that was/is working relatively well for existing use cases. But the team knew it likely wouldn’t be able to handle where they wanted to go with many more teams producing data products of much higher quality and potentially sophistication.
- When implementing data mesh – or any data initiative really – it is easy to fall into the trap of doing things the same way you did before. The “old way” feels safe and it was/is still working relatively well for H&M. So they treated their data mesh implementation as almost a greenfield deploy.
- Because of the long-term focus on making it low friction and scalable to share data – the consumers will come as you make them more data literate – most of the early data/enablement platform work has been focused on helping data producers. A common pattern in data mesh but your constraints and needs may not match.
- Erik’s team is focused on enabling data producers first specifically so his team doesn’t become a bottleneck. It is easy for a platform team doing any part of the individual work to become that bottleneck.
- Consider how much organizational change you require before starting to create mesh data products. H&M did a large amount of that organizational change, other companies start in their current structure and evolve as they learn more. Both are valid and can work well.
- Specific to H&M, a strong track record of good return on investment in AI meant there was less pushback than in many organizations when they started driving buy-in for implementing data mesh.
- In the historical data warehouse world, there was less need for data literacy because most people were pushed reports but also couldn’t do much, thus not “getting themselves in trouble”. If we move to a more self-serve approach, that means we need much better data literacy – it can be a big risk to allow access without understanding. Otherwise, it could be like turning a six year old loose in a fully stocked kitchen where they intend to “make dinner”.
- Data catalogs could really help push forward general data practices but we still need to have actual conversations too. Being able to ask someone about what data means and similar high context exchanges are crucial.
- “If you have a complicated business, you have complicated data.”
- If your mesh data products don’t maintain loose coupling, your data mesh implementation is probably headed for troubled territory. It’s one of the key tenets of Zhamak’s concept of a data product/quantum, to be architecturally independent.
- Input ports are an easily overlooked place to find reuse. Many teams need the same type or style of processing from similar source systems. Having standard input ports can significantly help reduce the complications around building data product ingest mechanisms.
About 3 years ago, when Zhamak’s first data mesh article was published (May 20, 2019), H&M was reorganizing to be a product-centric organization; data mesh dovetailed nicely with that strategy – they were moving away from IT as a service-oriented organization.
Erik and team knew that with a move to a product-centric approach, teams would need to be and would become data savvy and “data intense”. With their existing setup and knowledge, many teams would not be able to meet the new requirements because while H&M’s early AI investments were paying off, many teams just weren’t ready to do that complicated of data work. To scale their org-wide data capabilities, they would need something like data mesh as the teams doing AI were the very mature teams – maturing ~200 teams to that level would be essentially impossible. Especially when you think about getting to self-serve data producing and consuming as necessary to scaling ways of working.
The management team at H&M were bought in to the product-centric reorganization so overall, it was not too difficult to drive buy-in for implementing data mesh at the same time, per Erik. There was buy-in and interest in participating from all types of teams from the pure data producer to pure consumer and everywhere in-between. There were a number of teams with the capabilities and resources to participate.
As part of the platform/core enablement team for data mesh, Erik saw how helpful it was to work with multiple types of teams serving different needs. Because they worked on multiple pilots across a range of teams with differing capabilities as well as needs, they were better able to identify reusable parts of the data product development/deployment/management process to add to what the platform team offered.
Erik and team had a leg up on many other organizations considering data mesh: a data platform that was working well and serving current customer demands. Erik called the data consumers “happy enough” with their existing cloud data lake as they could mostly do what they needed to do. But the data team also knew that their existing cloud data lake would not scale to what they needed in the mid- to long-term as it would not likely be able to handle ~200 teams all producing data products. A key benefit of this existing well-functioning solution was there wasn’t a rush to get a replacement in place.
H&M’s approach to building out their enablement/data platform was almost a greenfield approach, per Erik. He said it is easy to fall into similar patterns of what you’ve done in the past, especially since their existing solution was already working. But they knew they had to stay away from the gravity of what they’d done before and look to new ways of working. But again, they had the time to do it right and think about the initial stages as a bridging solution, not a rip and replace. And thus far, it is working well.
To date, the main focus of H&M’s data/enablement platform team has been building the self-service capabilities for the data producers. There is a large pool of highly data literate data consumers already, especially the teams mentioned above that are advanced in applying AI. So these initial stages are about testing so they can discover the ways to make it easy on data producers to create and manage data products. Most of the initial data products are source-aligned, generic data products not tailored to any specific use case.
The mid-term data/enablement platform strategy focuses on iteration and learning patterns. Erik and team know they won’t get it all right upfront, so making sure people understand there will be iteration and evolution is key to keeping people bought in to the long-term, big picture vision. That’s where they plan to really focus on making the platform as easy as possible for consumers as well.
Erik shared the big reason for focusing on building the enabling capabilities into the platform rather than the data processing or other capabilities. First, they already have a good platform that can do the data processing 😀 But also by not taking on any of the work themselves and finding ways to reduce friction, they can stay away from becoming a bottleneck and make it easier for more teams to participate. It is easy to get dragged into specific work.
Per Erik, as many guests have said, data mesh is very much of an organizational-focused effort. The technology and architecture side aren’t easy but to have a successful implementation, more effort will need to be spent on the organizational aspects. H&M was inspired by what Spotify has done with their organizational approach, leading to their return to the previously mentioned product-centric thinking/approach. One interesting point is that Erik believes you need to implement at least a decent amount of your organizational change at the start of your journey or teams will struggle to deliver mesh data products.
So why did H&M not have much pushback, why were so many teams including data producers bought in to participating in the data mesh implementation? Per Erik, H&M has had a good track record of driving strong returns on their early investments in AI, especially around driving business optimizations. But overall, people understood that the current AI setup would not scale to a wider audience. So they’ve seen strong returns from doing data well and trust the data leadership to deliver further.
Erik made the interesting point that in the data warehouse world, most data consumers were plenty data literate relative to needs – but that was because they were fed the reports directly with no real push to be inquisitive. Everything was also controlled so there was a good data quality filter. Once you open up to self-serve consumption, that can cause issues.
The big issues Erik has seen with allowing self-serve access without proper training / data literacy efforts are mostly around data misuse. Not unethical or inappropriate use but simply misunderstanding what the data means and which data to use to answer important questions. But he hopes that their data mesh implementation will guide people to the right information, especially by providing the right contacts to get more information.
Per Erik, many many people in data are putting a lot of hope in where data catalogs are headed. But the data catalogs should not be the only way people learn about what data is available or what said available data actually means. Conversations about data are valuable – and they can be fun! A good example Erik gave was if people are asking a lot of unexpected or possibly strange questions about your data product, it might be a signal you should re-engineer it.
Erik and Scott agreed that part of where data mesh approaches things so differently is the emphasis on loose coupling between data products. Coupling in data has made it extremely difficult to make changes historically so we need to prevent that BUT still make data interoperable. Otherwise it’s just high quality data silos. But not every data product needs to interoperate with every other data product. And there also needs to be different types of data serving based on consumer needs so data products will need multiple output ports.
In wrapping up, Erik shared the specific types of patterns and practices the data/enablement platform team is working on. Schemas and generally schema handling, sensible defaults, input ports, etc. The input port example was really interesting and enlightening – Scott hadn’t heard that example in 80+ interviews.
Erik’s LinkedIn: https://www.linkedin.com/in/erikherou/
H&M Career page: https://career.hm.com/
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB