#59 Knowledge Graphs as the Engine for Collaboration Across Data – KGC Takeover Interview w/ Philippe Höij and Guest Host Ellie Young

Provided as a free resource by DataStax AstraDB

Data Mesh Radio Patreon – get access to interviews well before they are released

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here

Knowledge Graph Conference website: https://www.knowledgegraph.tech/

Free Ticket Raffle for Knowledge Graph Conference (submissions must be by April 18 at 11:59pm PST): Google Form

In this episode of the Knowledge Graph Conference takeover week, special guest host Ellie Young (Link) interviewed Philippe Höij, Founder at DFRNT.

At the wrap-up, Philippe mentioned that data architects should be able to communicate in ways other than PowerPoint. We need new and better ways to express ourselves, and the way things are connected. We will always need metadata around our data, we need text to express our ambiguity; but we don’t have great ways to express things that are slightly ambiguous – not fully formed but also mostly known. A good tool allows to more easily query your model of the world to iterate and increment on it. That is where knowledge graphs can be the most helpful. Ellie responded, “It’s not difficult, it’s just complicated.”

Philippe shared his journey towards knowledge graphs, especially thinking about the AIDITTO project he and a team built out of the “Hack the Crisis Sweden” event in 2020 around COVID-19. He needed a way to prototype, visualize and collaborate on data and the connections between data at scale. A regular data model does not convey enough information about what the data is and how it relates.

Ellie then shared some insight into the difficulties around collaborating on data across organizations and people in her climate change work at Common Action. Collaborating across organizations, all with different ways of working, you need a common “language” or way of communicating relative to data but can’t easily develop a shared schema. Knowledge graphs provide incremental capabilities for collaboration.

Philippe talked about when collaborating across organizations, you still have the needs master data management (MDM) tries – and often fails – to address but there is zero capability to manage the other organizations’ data flow into the shared data “pool”. Philippe was having issues with open-ended knowledge graphs like SparQL or OWL – they needed composable data structures to be able to be flexible in the case they can’t fully decompose a concept, especially as the concept or their understanding of the concept evolves.

For Philippe, TerminusDB was a big win because it allowed for composable data structures and much easier querying across the graph. Ellie discussed the origins of TerminusDB being about collaboration across many entities/organizations so it has a much different approach to accepting data that doesn’t necessarily conform to a schema or data model. The “git for data” concept in TerminusDB also really was a big win for Philippe as it made experimentation much easier.

Ellie shared some of the challenges in her work at Common Action around working with many different entities, many of which are small and not that data literate – or “data native” – and how they need to enable collaboration without rigidity as things are so dynamic. Philippe discussed the need for enabling people to collaborate in a “messy” environment – the world is changing and trying to spend all your time and effort categorizing it into a single schema isn’t realistic. He believes you need to enable collaboration in a truly distributed environment; the value is driven by micro level action via autonomy – people making progress in their own domain – which creates global value. Too often, we’ve tried centralized collaboration and it doesn’t work. The collaboration shouldn’t be a heavy overhead to driving that value – how can we flip the script to make the collaboration the enabler?

Philippe shared about how knowledge graphs can be used to manage compliance with security standards. You can map out much more easily who has responsibility for what and even identify gaps in your compliance adherence process. Being able to query that information easily makes it far easier to make sure you are identifying and mitigating risk. Ellie talked about breaking things down into paths for what is happening, what is not happening, and then what needs to happen to actually hit future goals. She mentioned it’s a new way to interact with change and the unknown.

For Philippe, we need to start somewhere in breaking down the complexity and visualize what’s going on; what are the patterns that we can see? Let’s model and share them in a low complexity enough way. You can start to see the concepts and connect them in a way that we as humans can understand. It’s almost like building a hivemind concept – each brain has it’s own context and then when you share that context into the greater whole, it’s impossible to know what incremental information or knowledge will be generated but it’s almost an inevitability that it will happen. Many more patterns will emerge – patterns of patterns. But we need to be able to share that context in some way to have those patterns emerge.

Ellie shared thoughts about what is complexity – just something that is so big and/or gnarly that it is – at best – difficult for a human to understand it all. Things are so interconnected, you can’t just adjust one piece or aspect. BUT it’s okay to not understand the absolute complete picture, we can move forward with confidence and/or identify the most likely best challenges to address. Philippe believes it is often sufficient to understand directionality to move forward and make progress. Knowledge graphs help us deal with complexity and capture aspects of complexity in a way that is more understandable. How can we unravel the giant knotted ball of yarn one bit at a time instead of mapping out the entire unraveling process ahead of time? Ellie talked about knowledge graphs creating better information flow about what impact changes or prospective changes to data we are sharing will have on downstream consumers.

Philippe mentioned that knowledge graphs aren’t great for every use case – look for the places where it really makes sense. Look at the specifics. While you can use graphs to manage the interconnectivity between the data, not all relational structures benefit.

Then Philippe discussed what he is working on and when to release it – he’s focused on making data modeling much easier but doesn’t want it to be overly technical. He wants to guide and enable the changemakers. It will be about enabling the collaborative aspects of knowledge graphs so people can have much better conversations about the data. And then people can express changes in data instead of in a PowerPoint.

Ellie mentioned what Veronika discussed in her episode, that you can model your data in the same way as you are used to in a knowledge graph but it’s as a form of data immediately rather than taking it and transforming it into data. We have to talk to each other to discuss our data conventions and develop a new relationship for business users to data.

Ellie’s LinkedIn: https://www.linkedin.com/in/sellieyoung/

Philippe’s LinkedIn: https://www.linkedin.com/in/hoijnet/

Philippe’s Twitter: @hoijnet / https://twitter.com/hoijnet

DFRNT website: https://dfrnt.com/

Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode created by Lesfm (intro includes slight edits by Scott Hirleman): https://pixabay.com/users/lesfm-22579021/

Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB

Leave a Reply

Your email address will not be published.