Sign up for Data Mesh Understanding’s free roundtable and introduction programs here: https://landing.datameshunderstanding.com/
Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Glovo’s meetup group: https://www.meetup.com/glovo-tech-talks/
Javo’s LinkedIn: https://www.linkedin.com/in/javiergrandag/
Javo’s Twitter: @JavierGrandaG / https://twitter.com/JavierGrandaG
Pablo’s LinkedIn: https://www.linkedin.com/in/pabloginerabad/
In this episode, Scott interviewed Pablo Giner Abad, Global Director of Data and Javier “Javo” Granda, Senior Data Manager at Glovo.
From here forward in this write-up, P&J will refer to Javo and Pablo rather than trying to specifically call out who said which part.
Some key takeaways/thoughts from P&J’s point of view:
- It’s okay to not fit the exact or complete picture of data mesh in your early journey. Focus on what matters to your org and implementation and focus on learning over trying to be perfect. Iteration is possible and not too costly with data mesh. That’s sort of one of the main points of data mesh.
- When selecting your first use case, look for high value and low dependencies. The less cross-team coordination work needed to actually get to an initial end data product that has value, the better. And buy-in is much easier if the producers are one of the consumers too 🙂
- When starting out, really look at how thin of a slice you can get away with for your MVP. Be prepared to make some hard compromises. Make them with your eyes open. It’s tech debt but taken on consciously.
- Focus on solving your problems of today instead of trying to solve all your future problems. Fixing the challenges of today will set you up to fix the challenges of 6 months from now in 6 months.
- Focus on reducing cycle times to creating and iterating on data products more than you probably think you should. It’s easy to get focused on delivering new data products instead of the capabilities to deliver new data products but that will cost you more and more as your data mesh implementation matures.
- An important quote to remember re product thinking: “If you aren’t embarrassed by the first version of your product, you shipped too late.” Your data products don’t need to be perfect when launched.
- Just using your domain mapping from your operational/DDD side as your data domain map is likely to lead to some big challenges. Look to how your data flows to figure out good data domain mapping.
- Misaligned domain maps between the operational side and data side can also cause issues because a team may need to own the data domain but they don’t necessarily own the operational system or domain.
- Glovo’s biggest data pain point pre data mesh was that data quality was often not great and people spent huge amounts of time trying to check the data. If that’s a challenge at your org, you can probably get funding to fix it.
- If data trust is a key pain point, when you make compromises early, do not compromise on quality – or at least very clear communication on what quality means. That is the only real way to gain back people’s trust – actually provide them high quality, trustable data.
- Domain ownership is likely to be the most challenging data mesh principle in many organizations. Partially because it is the one that is most centered on change management.
- It’s easy for a central data team to fall into prioritization by loudest escalations. When moving to data mesh, make sure you don’t fall into the same trap anywhere. Make conscious decisions based on value not on loudness.
- It will be important to define your minimum viable data product. Is that the data shaped into a format consumers can use? Does that mean if target users are not that data literate, ownership extends into the visualization tooling? Hard to say what is right for every organization.
- Only produce a data product if there is a known use case. But once a data product is created, owners should think about how they might serve additional users.
- There needs to be more specific examples of what people’s early platform builds look like. It’s really tough to think about every capability and what might fit where.
P&J started out the conversation sharing about Glovo’s history with data – they have always had a lot of data and been data heavy but how they handled data was not very structured; they didn’t focus much on the data architecture. They treated the data warehouse like a data lake, dumping things in with little to no data modelling. People didn’t trust the data so they spent huge amounts of time checking data quality – and the company made some crucial decisions on data that wasn’t all that great.
Their data architectural choices were creating more and more bottlenecks as well according to P&J because everything was reliant on the central team to build and fix. They clearly couldn’t scale to meet the company’s needs. They built out the tech to try to support needs remove bottlenecks based on technical throughput/scale but the central team itself was never going to scale to meet their needs.
Per P&J, in a way, their problematic setup was helpful for data mesh buy-in because they had so much tech debt, it was easier to convince everyone they needed to move to something different rather than try to incrementally improve. Escalation by who screamed loudest just wasn’t working and the central data team was falling more and more behind. There was a strong demand for understandable reliability – what was the actual quality level? – and clear ownership of data.
Rather than trying for a short-term fix, they looked to build their long-term data strategy according to P&J. Where did they want to go with data. Data mesh came at the right time and gave them the structure they needed to start really identifying their issues and setting their forward vision. They knew they needed the agnostic platform to serve the broader company instead of point solutions. They wanted to really apply product thinking to data. Federated governance and domain ownership of data were more new to them so it has been harder to really figure those out.
P&J shared their initial plans which fits a common theme in Data Mesh Radio episodes: make a thin slice of every aspect they’d need – but make some hard compromises. They knew they needed a platform that could at least read and process data. Their first few data products were more shared ownership between the domains and the central data teams. Etc. It’s okay to not fit the complete picture of data mesh when you get going.
The business leaders didn’t really care exactly how it got done, only that it got done according to P&J. So they chose security and quality as their main focuses. Quality was very crucial to regain data consumer trust so they created tests to show what were the expectations and that the data products were actually meeting those expectations. Security was basically to protect PII – they didn’t need anything too complicated so they only built to what they needed.
On picking their first use case, P&J and team looked for a use case that had high value and low dependencies. The more dependencies, the more possible complications and ways to fail. So they chose customer interactions in the app – what were people actually doing in their app? Previously, to get at the data, it was very difficult because of many complicated combinations of data. The producers for that first use case really didn’t have to do too much, the data was already being created in a format that was close to usable. And the producers were also the consumers, which obviously made it much easier to drive buy-in.
At Glovo, every data set, every data quantum must have a clear owner but they did save a lot of that change management pain until later. They didn’t force the data producers to really own their data products and had the central data team really as the owner or at least co-owner. They are now pushing that ownership on to the producers and there is a fair amount of friction.
P&J brought up the quote from Reid Hoffman, who founded LinkedIn: “If you aren’t embarrassed by the first version of your product, you shipped too late.” So what is a minimum viable data product at Glovo? It is a data set, a group of tables, that has some amount of enrichment and structured to be used by the target data consumers. But they found that business leaders didn’t really care about data products as units of business value until they connected the visualization tooling. So if it’s not really usable by the target customer, is it really product-ready? How far into how data consumers use the data products does ownership extend?
As other guests have mentioned, P&J agree you should only produce a data product if there is a known need. But prior to data mesh, teams were only serving the needs of the teams closest to them in the organization. Now, data producers are thinking about who all could use the data and finding interesting new consumers, uncovering new potential use cases.
When asked what they wish they had been told when they started their data mesh journey, P&J shared a lot. One is to focus on the problems you have now and set yourself up to focus on the problems six months down the road when you are six months down the road. This was especially an issue with the platform team not serving what was needed now to try to build out the capabilities to support future use cases instead of the current use cases. Another is to focus a lot on capabilities – Glovo was focused on delivering data products but they wish they had focused on reducing cycle times to creating new data products. Another few: building cross-functional teams is crucial but always challenging; you need to be prepared to communicate more frequently than you probably expect and repeat yourself often. And lastly domain ownership will likely be a challenge in many cases.
P&J discussed how the data team “inherited” the existing domain map and they are still struggling somewhat with mapping out their data domains and how they differ to their operational domains. Just following the operational domains caused a number of challenges as it often didn’t align with how data was flowing through their systems for analytical use. Mapping out your data flows is crucial to establish the right data domain ownership. But then shifting data ownership to other domains is also a challenge. Not an easy thing to solve unfortunately.
Domain ownership is causing a lot of issues currently – data ownership is typically not an expected responsibility for domains per P&J. Changing ownership from outside the operational domain to the team – because it is in their data domain, is also very challenging. They are struggling to really define each data domain and why these data domains are needed instead of just using the operational domains. So P&J are asking for more people to create content around domain ownership.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here