Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Himateja’s LinkedIn: https://www.linkedin.com/in/himatejam/
Himateja’s AWS ReInvent presentation (link starts at her part): https://youtu.be/y1p0BGsPxvw?t=1991
In this episode, Scott interviewed Himateja Mandala, Senior Data Engineering Manager and Head of the Data Mesh Data Platform at Disney Streaming. To be clear, she was only representing her own views on the episode.
Some key takeaways/thoughts from Himateja’s point of view:
- ?Controversial?: Your existing data platform(s) might not be able to serve data mesh well, even with reasonable augmentation – especially if your data platform has become hard to change. You might have to build from scratch.
- When the data platform’s key users aren’t part of the centralized team, you need to think about enabling automated capabilities by default, e.g. security the second data lands or easy to leverage and understand monitoring/observability.
- ?Controversial?: Data products serving different use cases often end up looking relatively different. Is your data product for dashboards and reporting/analytics; is it for serving a recommendation engine or machine learning model; or is it more for internal usage? Be okay with data products not being uniform.
- Even if your data mesh platform operates outside the traditional paradigms, many data producers – especially data engineers – will still be thinking data pipelines. Be prepared for that, it’s an ingrained way of thinking for many.
- Data contracts are very helpful in defining and maintaining quality. If you set up good observability on your data products, owners can quickly identify when there are quality challenges.
- When building out your platform, user conversations are crucial. Go and focus on pain points. The coolest capabilities in the world won’t lead to good adoption if you aren’t addressing real pain/needs.
- Automation and blueprints are key to scalability in a data mesh data platform. Teams need to be empowered to easily do their work.
- Don’t only focus on creating the tooling to process and share data in the abstract, dig into how teams will share information with each other, how they will communicate. That isn’t only exchanging data via data products.
- Even if you have domains inside your organization that want to share data/information with each other, it is hard to get to a place where consumers can actually trust the data without a lot of explicit enablement of trust at scale. Enabling trust at scale is a key role of the platform.
- Enabling teams to go at the speed of their business through owning their own infrastructure really drives good buy-in. It might take slightly longer to get something spun up the infrastructure for the first data product but they quickly learn and will often strongly prefer the visibility and control they now have.
- When requests for new capabilities come to the data platform team, you need to consider how to generalize the capability to be applicable to more use cases if possible. And sometimes the right answer is the platform can’t support that one-off need.
- Centralize your governance capabilities in the platform but federate the decision making. There should be standard approaches to access control to make it easy for people.
- !Controversial!: Disney Streaming has very strong RBAC (role-based access control) policies to make it very easy to delineate who should have access to what but if you have a certain clearance level for one domain, you have that clearance level for all domains. Scott note: this is about 35min into the interview, it’s a really interesting approach. I couldn’t see it working for highly regulated industries but it’s working very well for them.
- ?Controversial?: Preventing data from leaving the mesh under any circumstances is an effective risk control – if someone somehow gets access to something they probably shouldn’t have access to, the blast radius is quite contained.
- If you have any data sharing agreements with partners/vendors, make sure to keep their access heavily contained. Create specific spaces – or cloud accounts – with strong rules to prevent them getting improper access to any other parts of your data.
Himateja started the conversation with the situation Disney Streaming was in that matched many organizations right now: many data platforms but not one that will really fit with data mesh, even with augmentation. So she and her team decided that because the existing platforms were too hard to change to meet the needs of a data mesh implementation, they’d need to build their data mesh platform from the ground up.
When you have new key personas leveraging the data platform, even if those are data engineers embedded into the domains, Himateja recommends rethinking how data work is done. What do people need automated and by default, like security? How do you create monitoring/observability that helps people easily pinpoint issues as they come up? How do you make data accessible by default at the data product and greater mesh level? Etc. In a decentralized, federated data approach, ways of working and needs will be different so dig into what are the actual pain points instead of solving the same pain points of previous implementations.
Himateja shared that while people may think data products are pretty similar, they end up relatively different based on use case. Audience also really mattered when trying to figure out what capabilities people required early in the journey – execs were often more focused on data privacy and security and data scientists were focused on data quality. It’s hard to focus on the business context at the platform level because many people are used to doing that via request. The data products themselves need to own business context.
Data contracts are crucial to maintaining data quality in Himateja’s view. While they are certainly helpful to data consumers, they are also very helpful to data producers because – with proper observability – data product owners can quickly identify and address quality issues as they emerge instead of waiting until consumers complain and downstream data is wrong. That proactive alerting and then response helps everyone better trust the data. However, data contracts are still a work in progress because not everything is easy to define in a contract, there are definitely gray areas that are improving but not great yet. Scott note: and that’s okay, we can’t get everything perfect upfront, we have to iterate towards better 🙂
Himateja then shared a lot about what the data platform team that she leads set out to do at the start of their data mesh journey. One aspect was to create a center of excellence approach, standardizing how data engineering work is done to create data products across the 15+ teams running on the platform now. They did that by starting to drill into pain points and doing lots of listening to potential users. They needed to take a different approach rather than just yet another data platform.
Preventing the central data platform team from becoming a central data engineering team was a worry for Himateja: how do you prevent being a bottleneck and empower teams to do what they need to do? Especially at the start of a journey? As many guests have pointed to, automation and blueprints have been crucial. Teams pushed back initially at the thought of managing their own infrastructure but they realized it gave them the ability to move at their own pace – no more waiting in a prioritization queue for necessary infra. Another key milestone was developing tools to make it easy for cross domain communication and data sharing. Domains at Disney Streaming actually wanted to share their data with each other and the data mesh platform/implementation made that possible – it was previously very difficult to trust data but now that quality metrics were clearly defined and tracked, data sharing and usage between domains increased significantly.
Prior to doing data mesh, Himateja shared that data engineers in domains had no real visibility into data infrastructure – provisioning timeline or any other aspect. They’d push a ticket and wait for things to happen. But with data mesh, since they own the infrastructure, they can go at their own pace and can understand much more about any delays. So they understand and better control their own timelines, which makes them far happier. And then once they’ve gone through the process of spinning up infrastructure the first time, the next time they can be that much faster. They are better able to move at the speed of their business.
Himateja shared about the process of evaluating new platform capabilities requests. As many past guests have noted, you need to establish a process to abstract away the requirements from individual use cases to find a generalized approach. Otherwise, you end up with yet another overburdened platform that you can’t evolve. A specific example at Disney Streaming was enabling their Apache Kafka clusters to better communicate across domains which were leveraging individual cloud accounts. Instead of building a solution to share data for each technology they use in the platform, they built a system to better enable sharing across accounts with proper access control and privacy. And sometimes, the answer is that you can’t support a unique requirement via the platform – that’s okay and often is the right call.
At Disney Streaming, Himateja and team implemented a very interesting approach to access control via RBAC (role-based access control). There are a few levels of data usage clearance but if you are at access to PII level or access to financial information level in one domain, it’s the same clearance level for all domains. This might not work for heavily regulated industries but it’s working very well for them. The work to decide who has access to what data is done ahead of time instead of constant requests. They just think very carefully about each use case and who should have access and why but there isn’t a need to manually grant access. And there is of course oversight to see how people are using data to make potential changes. Scott note: this is a really interesting approach and I’d love to hear people’s feedback.
In wrapping up, Himateja shared how they are strongly limiting their blast radius around sharing data with partners/vendors. They have accounts that are not able to get access to any other accounts where they give those partners/vendors access so there is not a way for them to access data they shouldn’t be able to see. It’s a simple security pattern but others should consider adopting it in her view.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf