Data Mesh Radio Patreon – get access to interviews well before they are released
Episode list and links to all available episode transcripts (most interviews from #32 on) here
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Neda’s LinkedIn: https://www.linkedin.com/in/neda-abolhassani-ph-d-61354329/
OSDU Ontology: https://github.com/Accenture/OSDU-Ontology
In this episode, Scott interviewed Neda Abolhassani PhD, R&D Manager at Accenture Labs. To be clear, she was only representing her own views in this episode.
There’s some very specific language about ontology in this episode but I think it’s quite approachable for most people as a good understanding of ontology, the difference with taxonomies, and some specific insight into developing and applying an ontology.
Some key takeaways/thoughts from Neda’s point of view:
- When starting developing an ontology, it’s best to start from the business questions you want to answer. It is okay to choose bottom up or top down, but the business applicability is the main point.
- You can convince people ontologies and knowledge graphs aren’t scary or that hard to learn and leverage with a small demo of what they do and how to use them.
- Look for open ontologies that have already been created around your domain or area you are trying to model. They can usually be easily augmented and extended but there’s no reason to reinvent the wheel.
- Data people need to learn enough about the domain to build the right ontologies and data models but data people learning domain knowledge can “discombobulate” them 🙂 Get the data people with the subject matter experts to learn what’s necessary.
- Try to keep your ontology as generic as possible but still encapsulate what you need; that way it is much easier to apply the ontology to other domains/departments.
- Set your ontology up to evolve as you learn more about your organization and as your organization itself evolves too.
- You don’t have to change your ontology simply because there are new use cases. If you designed it generically enough, it should be able to handle most new use cases whether as is or with a few additions.
- A good way to measure if your ontology is good enough and is still meeting your needs is to look at the business questions you want to answer. Are you able to answer them with your current setup?
- It’s okay to have a global ontology and then ontologies that are more specific inside the domain if that is necessary to extract more value from the domain.
- ?Controversial?: Knowledge graphs need an ontology as well as a data model. So if you plan to leverage a knowledge graph for data mesh, you will need an ontology.
- When is best to actually start to deploy a knowledge graph and develop your ontology for data mesh – if you go that route – is still somewhat up in the air. The earlier the better in general but it will mean more work as things are unsettled early in your data mesh journey. Basically, it depends.
- Constant schema changes can make designing and updating your ontology more challenging. Whether that means you want to delay adding something immature to your ontology versus waiting until the schema settles a bit more, it’s hard to say. Just, it can create additional challenges.
- ?Controversial?: To really have high-value interoperability in a data mesh, you need a way to capture not just the metadata of the data products but also the semantic meanings in the domains. And that should be done via a knowledge graph.
- Ontologies are richer than taxonomies because while both have definitions, ontologies also have description logic. This gives people an ability to define how data technically fits together across different aspects.
- OSDU, or open subsurface data universe, is a specific open source data platform for the oil and gas subsurface data. Neda developed an ontology to go along with that.
Neda started off with a definition of ontology: “So literally, an ontology is a formal explicit specification of a shared conceptualization. I know that it has lots of jargon, but I’m going to explain it to you. So it is an abstract model of concepts, properties, relationships, and it is standardized, it is machine readable. And it is not just the instance level data, it doesn’t include the instance level data, but it includes the schema and the type level information and how stuff should be connected in your domain.”
When asked if it’s best to start top down or bottom up when thinking about building an ontology, Neda said either is acceptable but the main advice is to start from the business questions you want to answer. After all, this isn’t an exercise for fun, there needs to be a business purpose. And look for open ontologies for your problem statement or industry. There are a number of ontologies that have already been created that you can leverage, extend, and/or use for inspiration. There is no real reason to reinvent the wheel.
When building your ontology, Neda recommends keeping it as generic as possible. That way, you can apply it to multiple domains with no conflict. But it still has to meet your needs obviously. There are ontology editors to make things easier as well but it’s important to set your ontology up to evolve as your understanding of your organization evolves and as your organization itself evolves. You can even do version control of your ontologies to make collaboration far easier as multiple parties look to improve the ontology simultaneously.
For Neda, ontologies and knowledge graphs go hand-in-hand. It’s okay to have an ontology for the global organization and another one for a specific domain if that’s of value. Ontologies are typically about communicating externally from the domain or whatever grouping you are representing. And knowledge graphs are for integrating data from different sources or domains. And for knowledge graphs, you need an ontology and a data model for it.
Ontologies are richer than taxonomies because while both capture the definitions, ontologies also have description logic. That description logic gives you a better ability to define things like unions, intersections, restrictions, and equivalences. So ontologies are broader than just concepts and terms.
Neda then discussed the OSDU or open subsurface data universe – an open source data platform for subsurface data in the oil and gas space. She specifically saw a gap in OSDU where companies loading their own data into the OSDU format was pretty challenging. It required a lot of subject matter expert time to match schemas to the OSDU format. So Neda and team developed a technique using a knowledge graph and AI techniques to try to automatically match and map data in a company’s own schema to the OSDU format. And as stated earlier, a knowledge graph needs an ontology 🙂
As Neda worked to build out the OSDU ontology, she looked at the OSDU canonical data format and reviewed the schemas to understand what embedded choices were made so she could ensure she added that in to the ontology. She looked at the ontologies specific to related spaces or even some that were part of the OSDU area of interest like seismic data. However, the ontologies that existed for oil and gas were mostly outdated and didn’t really cover what was really useful and interesting. An aspect of OSDU that made developing the ontology easier was that the schemas were not changing very often so there wasn’t a constant remapping and versioning challenge.
So, circling back to data mesh, Neda believes it’s important to leverage a knowledge graph to really ensure good interoperability between domains and data products. A data catalogue – or other mechanism for discovering data in data mesh – that only has information about the individual data products and not how they interconnect won’t have as much value.* When should you actually start to develop and deploy your knowledge graph is again something that requires more study and feedback. All else equal, the earlier the better, but of course as things are developing and changing rapidly early in your data mesh journey, trying to _also_ update your ontology will be a ton of extra work. Time will tell.
*Scott note: yup. Isn’t that just high quality data silos? Even if they interconnect, if people can’t easily understand and find the interconnections, you likely lose a LOT of the value of data mesh. Whether knowledge graph and ontologies are the best approach remains to be seen however.
Neda covered an aspect that is really important for all things data mesh: how to measure when things are good enough versus they need updating 🙂 for her, it’s about what she said at the start: what are the business questions you are trying to answer? If you are still able to answer those well enough, you probably don’t need to change your ontology. But if those questions have changed considerably and your current implementation is not able to answer those questions well, your ontology will need to be updated – maybe some new concepts will be added and some old concepts deleted. You do want to be careful to try to keep things backward compatible as you deploy a new version of your ontology.
Evolving ontologies is a challenging thing but if you designed your ontology well enough at the start, you probably don’t need to do it all that often according to Neda. You should design your ontology in a generic enough way so it can handle new use cases without every little new aspect needing a whole new ontology version. However, that doesn’t mean your ontology should never evolve. Things change or need clarification and you should be willing to be adaptable. Scott note: this is where Zhamak sees challenges with ontologies: if they are overly centralized and overly rigid, they prevent people from expressing real meaning at the data quantum level because they are trying to fit the definitions of the data quantum into the ontology.
In wrapping up, Neda shared her views on how to really get started on building out a good ontology and knowledge graph. It will require your data people to learn enough about domains from the subject matter experts to develop the ontology. Be prepared for that to be a bit confusing as sometimes learning a lot of domain knowledge can “discombobulate” your data people. And it won’t be a super quick exercise. But Neda believes it will pay out in the end and add a lot of value.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB