Data Mesh Radio Patreon – get access to interviews well before they are released
Episode list and links to all available episode transcripts (most interviews from #32 on) here
In this episode, Scott interviewed Anitha Jagadeesh, Principal Enterprise Architect at ServiceNow. To be clear, she was only representing her own views on the episode.
Some key takeaways/thoughts from Anitha’s point of view, some of which she specifically wrote:
- It is absolutely crucial to tie the data strategy to the business strategy. The business strategy must drive the data strategy which drives your data architecture.
- Architects need to lead the way in digging into use cases to get the specifics on what data producers are trying to solve for data consumers. Then, those architects can find the common patterns across use cases to tie to your organizational data strategy and also tie to your data architecture guidelines and principles. That way, instead of addressing challenges via point solutions, you can drive organization-wide choices that support many use cases via your data architecture.
- Architects also need to ask the probing questions to continuously tie work back to the business strategy and value or expected outcome for customers. If you aren’t driving the business strategy forward, if you aren’t helping the big picture, is the work worth doing?
- When it comes to data, companies shouldn’t be entirely offensive – trying to leverage data for as much value as possible – or defensive – trying to minimize risk as much as possible. So organizations that have been very conservative need to push to be creative/offensive and high-risk organizations will get themselves into trouble if they don’t start going defensive too.
- As we build the data strategy we have to catalog our data assets/products and contracts to access these data assets/products – internal, external, and third party. Next steps we have to enable active metadata to ensure the catalog is always current.
- Data contracts – especially SLAs and SLOs – are really crucial to driving reliable and scalable data practices forward. How can people trust what they are consuming without having to check it themselves unless there are very specific parameters and documentation of what they’re getting? The data space needs to rework the way we approach data contracts.
- We need to be careful to not head down the same paths/ways of working – just with different names – that we’ve tried and didn’t work. But we also need to focus on what we’ve learned from different approaches instead of reinventing the wheel where appropriate. Hopefully data mesh can thread that needle.
- When thinking about how you should split into domains, look at the business strategy. How does your organization tackle business challenges? That should inform how you create domain boundaries.
- One of the biggest challenges in data at the moment is centralization versus decentralization and/or federated. How far to go towards one or the other side across many many decisions is really crucial to your data strategy. Look for places to centralize support of multiple use cases but not take the decisions out of the hands of people who know best where possible.
- API-first is an important strategy for modernization of use cases. But it can easily lead to massive inefficiencies on the analytical side with large-scale queries. So we need to think about how we can do APIs in an analytical world and consider patterns and guidelines to support bulk data consumption with volume, performance, and limits. .
- It will be difficult but worthwhile for organizations to migrate existing data assets to decoupled data products. There are many ways to approach that challenge, such as the Strangler Fig Pattern, but you need to take lots of care to do it right rather than disrupt the ongoing business.
- Trying to serve real-time use cases – measured in millisecond latency – and certain types of other analytical queries from the same data product is likely to cause big issues. If you have a very large data pull from a service, that can greatly impact performance. Let’s not go back to the days of trying to run large-scale queries against production and causing outages or look for other architectural patterns to enable performance like replicating data to other data products. Ideally large consumption analytical use cases should be managed with some limits for real time use cases or run them on analytical data products , which will have different infrastructure that is fit for analytical use cases.
- It is crucial to have your governance team switch from defensive-only and a bottleneck to an enabling team – allowing domains to make smart decisions and providing the center of excellence and standards to let the domains focus on making the value-add and domain context specific decisions where possible.
- It’s crucial for both sides in a potential data initiative/project to share as much context as possible about what are the potential outcomes weighed against the potential costs. How can both sides collaborate to maximize the return on investment? Just seeking the highest return possible is what has doomed many data initiatives – let’s move past that way of working.
- We are heading towards Hybrid cloud, multi-vendor, multi-region, real-time needs of data. That will require us to rethink architecture that can scale and support agility.
Anitha has seen a lot of data and engineering practices and patterns over her long career. In some ways right now, she is seeing many people heading down close to the same paths – just with new labels – that haven’t worked. Data mesh tries to address a number of these historical challenges but we should make sure to deeply understand what history has taught us so we don’t need to reinvent everything or make the same mistakes. History may not specifically repeat but it’s easy for it to rhyme.
For Anitha, a lot of the approaches people are trying in data miss the mark by not focusing on the big picture first – what is your business strategy? Your business strategy should drive your data strategy, not even just inform it, and then your data strategy should drive your data architecture. Far too many people start at the data strategy or even data architecture level.
Anitha – like many of us – is seeing major changes in the industry with most organizations transitioning to cloud, product-centric, and/or API-first approaches. Industries and organizations that have traditionally focused on defensive data strategies – those that protect the data to minimize risk such as compliance – will need to get offensive to compete and drive value. Luca Paganelli’s episode covered how HERA is transitioning from defensive to a balanced approach. But on the flip side, the companies that have focused much more on offensive data strategies – trying to derive as much value as possible from data with little controls in place – really need to step up their defensive game. Companies with a balanced approach to offensive and defensive strategy are generally the most likely to win.
Data contracts is one of the biggest unresolved or not well solved issues in data for Anitha. The way most organizations are still doing data contracts – which is often not at all… – just isn’t working. SLAs (service level agreements) and SLOs (service level objectives) are crucial to driving data trust when it comes to contracts. Emily Gorcenski’s episode covered data SLAs and SLOs in-depth. There are some approaches emerging but as there have been many episodes of this podcast covering data contracts, it’s still a quite immature data practice that needs further work.
When asked about how to drive good, broadly applicable choices rather than just solving for the specific use case, Anitha talked about again circling back to the business strategy and the business use cases. You need to not be simply reactive to requests but look at how those requests play into the bigger picture. Architects should play a role in digging deep into use cases and requests and then finding the common patterns that support your architecture runway for teams to develop products. They need to dig in with many people across the organization and find what you really need to solve for in general across your many use cases. That way, you can address a broader scope of challenge more easily rather than building to each use case. Much easier said than done or course.
Anitha made an interesting point about how your business strategy and data strategy should drive your domains. Yes, every organization or industry has a different domain map but even when thinking about what you are trying to accomplish and how your organization tackles challenges, you should look to use that as your general approach to mapping out your domains. Is that business need/capability? Is that application-first? Is this a domain specific compliance need? Etc. Piethein Strengholt talked about multiple different ways to map domains in his episode.
When asked about how organizations can think about centralization versus decentralization in data, especially regarding governance, Anitha admitted it’s very hard to create rigid rules that are actually good despite how easy that might make things. You need to ask how you can centralize the standards and the tooling so you can support multiple use cases but not have centralized decisioning when the domains know best. Having a hybrid governance – grassroots and centralized would drive better data management practices. Per Scott, it’s definitely not a black-and-white decision and this will be one of the hardest challenges for many organizations in the next few years.
A big trend for Anitha in digital modernization is more and more vendors heading towards API-first. But trying to use APIs like we have in the past will not be efficient at all in many cases in the analytics space. A very large query via API could be extremely inefficient. We need to think about how we can serve analytical needs better in an API-first world. What actually is an analytical API? How can we grab 100,000 records in a single query that isn’t a 1 by 1 pagination? Still remains to be seen.
While many organizations would love to have a greenfield to deploy their data strategy and data initiatives, it’s just not a reality for most according to Anitha. There are existing data assets in place. Moving them to being data products is essential but business also must go on with their current day-to-day. It’s a difficult challenge to migrate people over to new data products. And data monoliths have very unclear sets of data products all intertwined. As an example, in the past, Anitha and her team were the victims of their own success as they built a very successful data warehouse that more and more teams moved to use. As the number of use cases and load increased, the performance decreased. We need to move to decoupled and more scalable ways of working in data to prevent success from being the path to failure and/or pain.
Anitha and Scott discussed how important it is to build specific solutions to be fit-for-purpose, especially around SLAs and SLOs. On the data side, if something really needs to serve something “in real time”, meaning measured in milliseconds, you wouldn’t also want to allow heavy analytical queries that could slow down what it is serving. Which circles back to why API-first is currently challenging. But you should also dig into what people mean when they say “in real time” because it is often “not on a 24 hour delay but 2 hours is fine”. Get specific, dig into details and the why.
Anitha has some specific recommendations regarding data governance as she views it as crucial to really getting data products rights. She recommends creating a general center of excellence and central tooling support but with grassroots decisioning when it makes sense. So create the standards centrally and look to empower teams but also have that center of excellence to serve as a backdrop to be the experts on how to meet general governance needs like comply with GDPR, CCPA, etc. You want your domains focusing on the value-add decisions, have the central governance team look to be an enabler.
A key responsibility for all architects in Anitha’s view, is digging into how and where are you planning to use this data? What is the target use and more importantly, what is the target outcome? When digging in, you can really assess not just what might be the return, but what is the cost – both upfront and ongoing. It’s easy to think about how great it could be to have this massive set of data but there needs to be a balanced return on investment. And time-to-market is also crucial. So we need to encourage collaborative negotiation around data requests instead of simply handing over requirements.
In wrapping up, Anitha circled back on the concept of architects – and other roles too – really asking probing questions on specific use cases while keeping the big picture in mind at all times. Ask if this work supports the big picture, the business strategy. If it doesn’t or if there isn’t a clear tie to the business strategy, is the work worth doing?
Anitha’s LinkedIn: https://www.linkedin.com/in/anithajagadeesh/
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB