Sign up for Data Mesh Understanding’s free roundtable and introduction programs here: https://landing.datameshunderstanding.com/
Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Ghada’s LinkedIn: https://www.linkedin.com/in/ghada-richani/
In this episode, Scott interviewed Ghada Richani, Managing Director, Data, Analytics, and Technology Innovation at Bank of America. To be clear, Ghada was only representing her own views on the episode, not that of the company.
Some key takeaways/thoughts from Ghada’s point of view:
- “What I know today is going to change tomorrow.” Data mesh is a journey, don’t try to get too comfortable, we should always be trying and iterating. Be an explorer. Scott note: strong agree and so does Zhamak
- Speed is always a challenge with data mesh – some want to move too fast but others want to boil the ocean to make everything that comes after extremely fast. Work with people to take them along the journey and be part of the decisioning process, don’t get ahead of yourself.
- Take your stakeholders lockstep along your journey, keep them very informed. Let them control prioritization. That way, they can see what changes are happening and why delivery timelines are extending – they made those calls!
- Expose the evolution of the data product to stakeholders. They can then understand tradeoffs, especially in regulatory or other governance challenges. And again, they own the prioritization π
- Don’t start with generic requirements, start with stakeholder deliverables. The requirements will emerge from that conversation – stakeholders don’t necessarily know what is required technically or as a product structure, but for sure they know what they want to achieve from a business perspective.
- In data mesh, data producers should be treated as a stakeholder as well. Make sure they are engaged and that they are getting something from the process. That can be credit/visibility for value creation, true ownership instead of demands/requirements, additional insights about their own domain, lots of information about why this matters, etc.
- When driving buy-in and/or getting approval, you have to know your audience. That might seem obvious but it’s really not. What do they care about? Have you actually talked to them about what they care about? How can you win them over? How can you make it make sense to them and excite them?
- When trying to get approval for a big project, break it down into tangible pieces. If there are 20 things that could be improved around a high-value process, look at them discreetly and deliver more and more of the 20 over time. You get a budget to fix 1 or a few and then prove out value and get a budget for more.
- If something is a priority for your team but it isn’t for another team that you need to partner with, lean into that friction. Why is it a priority for your team and not their team? Should that be pushed up the chain to align better? You can’t make it a priority by pressing them, you need to work the right levers.
- ?Controversial?: One lever you can pull to drive data producer buy-in is the opportunity cost of not acting. There is a reason for a use-case, something that will improve a process, drive value. If you ask the reluctant data producer to own the risk, the cost of not acting to improve that process, most will say no way and will participate.
- Measuring the value of data work is very hard and pretty imprecise. It must be a collaborative process with stakeholders – they are the ones who derive the direct value from the work. How much value is there in speeding up time to deliver data products by 40%? Your data/platform team can’t know.
- It’s crucial to build an environment where data failures fall on multiple stakeholders. And that a data product or other data work not meeting the expected value isn’t necessarily a bad failure – your hypothesis maybe simply didn’t prove true. Limit the costs and size of those failures but if you aren’t failing, you probably aren’t taking on enough risk.
- ?Controversial?: For high profile, high visibility projects/data product builds, check-ins 2-3 times a week is normal and is often helpful. You can identify and attempt to mitigate challenges and risks as they emerge.
- ?Controversial?: Create a highly visible accountability model for stakeholders and make sure they are aware. It’s not about calling people out but holding them accountable and if they aren’t doing what’s necessary, the executive sponsor should know and can address it.
- Virtualization at the query layer has allowed Ghada and team to mature the underlying data products over time while still presenting a mature and complete data product experience to users. Without virtualization, data consumers would not be nearly as happy with the data mesh implementation.
- A data virtualization layer has made interoperability easier as well. Connections between data products still need to be found but then they can be codified and offered as a view for others to use.
- Discovering and establishing domain boundaries is crucial in data mesh. Sometimes, “orphaned” data that isn’t really owned by a data domain might live there temporarily but you should always be looking to find/create a true owner if people are relying on that data.
- Your domain boundaries will change and that’s okay. Be ready, be vigilant for measuring if it’s time for boundaries to change. There isn’t a silver bullet way to approach this but with experience, necessary boundary changes will start to become more obvious.
- It’s okay to have more than one data product per data domain but make sure they are truly incremental to each other. The boundaries and the governance are more important than the number of data products in a domain.
- Every data product should have a specific purpose. Not serve only a single use case, we need data reusability. But don’t add too much scope to a data product.
Ghada started discussing balancing speed, structure, and control in your data mesh implementation. There are those that want to build everything upfront and boil the ocean but there are also those that want to get to value as soon as possible without taking the product mindset to heart. Work with both sets of people to keep them deeply informed and show them why a balanced approach works better. If stakeholders are very close to the journey, they won’t be pushing back on timeline – they can see where prioritizations are changing and the learning is happening as data products – or other aspects of your mesh – are being built. In fact, let them control prioritization where it makes sense so they are the ones causing timelines to stretch and they made the tradeoff decisions.
Keeping stakeholders closely informed also has benefits around control in Ghada’s experience. They can understand the tradeoffs relative to governance challenges like regulatory compliance. Exposing the actual evolution of the data product itself to stakeholders helps stakeholders feel comfortable with the process and that compliance/regulatory concerns are addressed.
For Ghada, starting from requirements for a use case doesn’t work well, people aren’t sure and they get stuck in the details instead of the big picture. Instead, work with them to focus on what they are trying to achieve, what their deliverables are, and then work backwards to figure out what they need to meet their own deliverables. And those deliverables better be tied to value somehow π
While driving buy-in from data producers, Ghada recommends making them a clear stakeholder in the process. She’s found that really deeply informing them of how their data will be used and the value it will drive often gets them excited to participate. Of course, you need to work with them to prioritize the work but showing them the value – or potential value – of a great use case often helps set that prioritization. You also want to make sure to highlight their work, either for them or preferably making the stage for them to present the value delivered from their data, giving them credit and visibility. When you do those things and give people true ownership, not just requirements, many data producers are far more willing to get involved.
In general, when trying to get approval for data work, while Ghada recognizes it can be hard, she has a few good approaches. One is to look at different aspects of what you are trying to improve. Say a process or product line drives significant value for the company. What could you do to tangibly improve the value it delivers? Not as one giant project, break it down into more tangible improvements and seek a budget to tackle one or a few so you can prove out value, getting a budget for additional improvements. Another is to know your audience. This might seem simple but really, you have to learn what drives your counterparts and find a way to communicate the benefits in their own language and address something that matters to them. Make it digestible and hard to resist wanting to tackle the challenge. It’s definitely more art than science.
One way Ghada has found to drive buy-in from reluctant data producers is to assign the cost of not doing something to them. Essentially, there is a benefit, a value to doing the proposed work – whether that is increased revenue, decreased cost, decreased risk, increased speed, etc. So, there is a negative of not doing the work the proposed work and you ask the reluctant data producer to officially own the cost of not doing that, own that business risk. Many have become far less reluctant to participate π
For Ghada, there are two ways in general to measure the value of data work – economic value and impact value. Economic value is slightly easier to conceive if not that easy to measure – if you make improvements to a process or say create a new product line, you measure the incremental revenue it drove or the amount of cost savings. Impact value, the team(s) impacted by the changes have to give the value measurement – what is the value of speeding up a process, improving the data quality, lowering the associated risk, etc. Neither are exact measurements so it’s crucial for stakeholders to understand that it’s about triangulating and assessing value, not an exact amount of return. And the stakeholders again have to be the ones that assess value. Only they can say what an impact would mean for them, the data team doesnβt have the context to do that. And you need an organizational environment where the forecasts are seen as forecasts, not commits.
It’s okay to have failures in data work, according to Ghada. As many past guests have also noted, experimentation is about trying, learning, and iterating. Sometimes the learning is that this won’t work or isn’t worth the effort. Getting to that learning quickly and iterating to value – or stopping work when that’s the right call – is crucial to driving significant value from data work. Your culture must allow for failure or you just won’t take on initiatives that are higher risk but higher reward and where the reward justifies the risk. You need to see getting to value and getting something directionally right as a win so you can iterate towards more value.
In Ghada’s experience, for high profile, high visibility, high intensity projects/data product builds, it’s not unusual to check in 2-3 times every week with all the stakeholders. While it may feel like overkill, you can find miscommunications or friction early and even more importantly you can identify and work to address challenges and risks as they emerge, e.g. if someone is disengaging. Instead of the data team going off and doing a bunch of work to deliver at the end of a months-long project, it’s tight feedback loops and iteration and changing priorities through close collaboration. And have a highly visible accountability model – if someone isn’t delivering, that should escalate to the executive sponsor to figure out prioritization and an appropriate response.
On the platform side of things, Ghada is very happy with their use of data virtualization for their virtual query layer. As teams have learned how to build and mature their data products, data virtualization has meant they can expose what a mature data product looks like even when the underlying data product is not yet mature. The underlying data creation and curation process is not fully productized or robust in many instances but consumers don’t have to care. The views presented to users are controlled by subject matter experts and serve as a type of interface or output port of a sense.
More on data virtualization:
1) Sometimes, that virtualization layer can lead to query performance challenges but usually, that’s tied to someone trying to do too large of a query all at once instead of breaking it down appropriately.
2) Data virtualization has made exposing connections between data products much easier. It’s just creating another virtualized view. Connections need to be discovered/surfaced manually but beyond that, it’s quite easy to do interoperability if the data fits well together.
Discovering and mapping domain boundaries is really crucial in data mesh according to Ghada. And it will get easier as you go along. You really want to consider what are you trying to accomplish with a data product and not have many things loaded into one data product, or it will become overloaded and be hard to evolve/improve well. Data owned by a team that are not the subject matter experts is a likely occurrence but you should look to rectify it quickly. Teams building data products that consume information from upstream data products should not take unnecessary dependencies. At BofA, they create a virtual view that combines the upstream data from the source data product with the data product rather than that downstream data product taking a dependency. They are also having many domains that are represented by one data product but some have more than one data product. The boundaries and the governance are far more important to get right than trying to match a certain number of data products to domains.
Every data product should have a defined purpose in Ghada’s view, that’s how you can find your data product boundaries. But a data product should also not take on additional purposes, that’s scope creep. That doesn’t mean it can only serve a single use case, reusability is crucial but when someone tries to find the right source for accomplishing a goal with data, it’s best if they have to consider fewer options but still get all the data they want/need. Yes, easier said than done π
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf