Sign up for Data Mesh Understanding’s free roundtable and introduction programs here: https://landing.datameshunderstanding.com/
Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Vlad’s LinkedIn: https://www.linkedin.com/in/vladikk/
Vlad’s book Learning Domain-Driven Design: Aligning Software Architecture and Business Strategy: https://www.oreilly.com/library/view/learning-domain-driven-design/9781098100124/
Before we jump in, the phrase DDD is used a LOT in this episode. It stands for Domain Driven Design. We’ve had past episodes on it but I want to make that clear. It’s also important to note, there are some very specific terms used in DDD and it is easy to get overwhelmed. Look for the meaning and ignore the terms – we are designing how things work together, whether business capability, software systems, or general data flows.
There’s also the importance of the difference between published language and ubiquitous language in DDD. Essentially, the ubiquitous language is the language of the domain and the published language is the language used to share information from the domain externally to other domains. So ubiquitous is internal facing between the business and software engineers in the domain and the published language is external facing to the rest of the organization.
In this episode, Scott interviewed Vlad Khononov, Author of Learning Domain Driven Design (DDD) through O’Reilly, Senior Cloud Architect at DoIT International, and independent consultant on DDD and distributed systems.
Some key takeaways/thoughts from Vlad’s point of view:
- DDD is a big topic to learn – you don’t have to learn every aspect to take good value from it.
- Make the implicit explicit. The more something is explicit, the easier it is to manage and manage people’s understanding of it.
- It’s important that the published language doesn’t change as often as the underlying implementation model within the domain. But we also absolutely need to be able to evolve our published language. In data historically, we have trained consumers to believe we won’t evolve the published language and have tied that too closely to the implementation model.
- ?Controversial?: The concept of a domain in data mesh is very different from in Domain Driven Design (DDD). In DDD, you can really only identify business boundaries but in data mesh (or DDD for data), you can actually have an impact on – and potentially change – those boundaries.
- Iteration is a key aspect of DDD. It’s not about getting it perfect, it’s about getting to okay fast and improving. Learning to do that in data will be tough but important.
- It’s crucial to understand the difference between the ubiquitous language – the language of the domain used inside the domain – and the published language – the language/interfaces used to communicate across domain boundaries. In data mesh, data products should be in the published language to make them useful outside of the domain.
- DDD helps you to learn more about the business domain and provides patterns to help design software systems. But it’s ‘biggest benefit’ is that it provides a bridge between the business needs of the domain and the software systems.
- The reason organizations exist is not to ship software. They exist to execute on a goal – typically driving shareholder value – and software is merely a vehicle to that. Think about data the same way.
- A key aspect of domain driven design is communication across boundaries, whether machine or human to human. So finding and defining those boundaries becomes key to prevent – or at least manage – business friction.
- In DDD, it’s important to not mix up domains and business capabilities. Domains are about finding coherent sets of use cases for software systems. Business capabilities is the hierarchical structure many think of when speaking about domains.
- A model is a cornerstone concept in DDD – it’s how we look to solve a problem in software rather than try to reflect the real world in software.
- It’s crucial in DDD to get the software engineers to speak in the language of the domain – the ‘ubiquitous language’ – in order to design systems to appropriately solve the business problem. In data mesh, we need to learn to embed the business context in the data we share in a similar approach of finding a language of the domain but balance that with the published language.
- In DDD, it’s okay to have one implementation shared via multiple published languages, especially as different versions when the implementation model changes. We have to figure out how that versioning works with data, especially when the change is outside the hands of the producing domain – e.g. purchasing external data sets that then change.
- In DDD, it’s okay to have more specific integration contracts for implementations for the job at hand instead of one “jack of all trades” integration contract. In data, this could be multiple views of the same general information instead of a single data model to share all aspects.
- To get to a good published language in DDD, ask the people consuming from your domain, the counterparty in your integration contract. It makes sense in data mesh too to ask people what they want. Take that feedback and look for general purpose ways of structuring and sharing data but you have partners, you don’t have to design everything yourself.
- The anti-corruption layer in DDD is important – it is created by someone consuming from other domains and is a protection against ineffective models – specifically something that would contradict their internal model. In data mesh, this is made all the more important if we don’t have shared taxonomies.
- Pain isn’t necessarily bad. In our bodies, it’s a signal something is wrong. In software, friction/pain signals are often ignored but shouldn’t be. In data, we need to get WAY more explicit about when pain comes up and where it’s actually coming from.
Vlad started off with a simple Domain Driven Design (DDD) definition: “Software design should be a function of the system’s business domain.” But it once took him writing a 90 page book to really unpack what that meant. Domain driven design feels simple until you start to really dive deep and you realize 5m meant five miles, not five meters.
So why do we care about DDD? Per Vlad, DDD helps you learn more about your business domain, identify its requirements and strategic needs, and identify what makes a software system you are developing valuable. DDD also provides a set of design patterns and architectural styles to design your software solutions. Putting those two together, it bridges the divide between the business requirements and the actual design of the software solutions so your software solutions are designed to explicitly meet the direct needs of the business rather than through layers of abstractions and miscommunication.
For Vlad, the term domain is very complicated because it means a very specific thing in domain driven design but is often used interchangeably to mean what DDD would potentially call a subdomain but probably more correctly a bounded context. As an example, in episode 133, Ammara Gafoor mentioned her client has 21 domains across 100K+ people but in episode 130 with JGP, he used ‘domain’ to mean a team of 3-6 people. So getting specific around what the term domain actually means is important. In DDD, it’s also important to note that subdomains will not cover all aspects of a domain because certain parts do not have software systems covering them. A key aspect of DDD is finding your boundaries between the subdomains which are essentially sets of interrelated use cases.
A business capabilities model is actually what most people are thinking of when discussing DDD subdomains according to Vlad. He noted in an email “The business capabilities model is a more precise way to map a business domain than DDD’s subdomains. It gives much more insight, though usually the subdomains model provides just enough insight needed for designing software systems.” This is where you can have a hierarchy so Marketing may be a tier 1 business capability and that might be broken down into Digital Marketing versus Brand Marketing versus etc. And then Digital Marketing might be broken into Lead Generation and Demand Generation. Or by Paid versus Organic. Or many other ways.
So, DDD is again about finding the boundaries between your use cases or a coherent set of use cases that work together. Those boundaries are used to drive the software system design decisions, how things work together, what architecture to use, etc.
Vlad believes the concept of a domain in software is very different from the domain in data mesh. Essentially, when attempting to do DDD for data, we misuse the term domain to mean line of business or business domain. In DDD, the boundaries that the team can decide on is how the software components work together across boundaries but they cannot really change the business capability boundaries or the subdomain boundaries. But in data, you have the ability to actually change the business capability boundaries and how things are grouped into bounded contexts, to impact the flows instead of simply identifying them.
We can’t really reflect the real world perfectly in software – or even all that closely – according to Vlad. So instead, DDD uses the concept of a model, where the model is designed to solve a problem. The model should contain the minimum knowledge to solve that business problem – this significantly limits scope, which means preventing needless complexity.
A key aspect of DDD – which translates extremely well to DDD for data / data mesh – is finding the ubiquitous language. You must get to a level of communication where the software engineers can understand the domain well enough to speak in the language of the domain to the subject matter experts (SMEs) – typically referred to simply as ‘the business’. Scott note: In data mesh, data product developers will need to speak in the ubiquitous language of the domain to understand how to share the domain’s context with the rest of the organization. It remains to be seen how much data products should be specifically in the language of the domain versus the language of the broader organization AKA the published language in DDD.
Vlad discussed how important iteration is in DDD – it’s not about getting it perfect the first time, it’s about getting something out there and improving upon it as you learn more and as the world and domains/subdomains evolve. This applies well in data mesh because iteration and open sharing are crucial to getting to a data driven culture. However, in DDD, evolution of models creates friction. The way the systems were integrated is now changing so that is why there is a published language – essentially the integration language for how domains/subdomains communicate across boundaries. This is decoupled from the business systems so the business systems within the domain can evolve without it impacting how others interact with the domain. In data, the data warehouse has typically been quite tightly coupled so changes are extremely painful or in the case of the data lake, it is often raw data with no real published language or version controlling. This is a big friction point data mesh aims to address but exactly how companies address it will be varied and we are still figuring it out as a collective industry.
In DDD, Vlad doesn’t like the concept of a single integration contract for all purposes – this is that mythical unicorn type approach where it’s good enough for every scenario. Instead, he recommends communicating with the other party in your integration contract to find what works for them. It means less work for the producer and the consumer 1) has a say so more buy-in and 2) gets what they want. Of course, this can mean more overhead so look to balance that. For data mesh, it remains to be seen if this will create too much overhead – a big issue in data is overly specifically engineered solutions leading to little reuse so we must balance usability for your key consumers with usability by all while considering overhead.
Vlad then went into some of the patterns for actually sharing information in DDD if you’d like to dig deeper.
An anti-corruption layer in DDD provides protection for domains consuming information from potentially ineffective models to your use case according to Vlad. That doesn’t mean the integration contracts themselves are ineffective but it might mean that some terminology difference – we always use the example of what does customer mean – that would invalidate some of the work you are doing because it contradicts your own internal model/use case. It’s crucial to understand how to implement these protections for data consumers but in data mesh, data consumers should also make sure producers understand when things are ineffective or counter to their internal model. It doesn’t mean things will necessarily change but it makes the friction point public/explicit, resulting in better understanding at the least.
For Vlad, pain in software can be good. It helps us to at least identify where there is friction and try to address it. But many teams try to ignore pain in software. It’s a signal your boundaries are ineffective. That can be simply because things have changed and they were effective before, not a sign you had it wrong. But being on the lookout for where change will be effective and value-add is crucial to doing DDD right.
It’s important to keep an eye on what we are sharing by thinking about what could be more effective as well in DDD according to Vlad. As we learn more about a domain, or as the real world changes, we can share better information. So change can be a value-add. We obviously want to balance that with constant change but if you can more effectively share information, do it. We have written language now, we didn’t 15K years ago. So we are far better at sharing information. When you develop a better way, look to implement it.
Vlad finished by saying it’s okay to dip your toe in the water relative to DDD. You don’t have to learn everything and have an all or nothing approach. You can take parts of it and slowly adapt your ways of working. Important for data mesh too – you can take pieces and it’s not going to be perfect, just keep making progress and don’t delude yourself of the work yet to be done.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here