Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here.
Samia’s LinkedIn: https://www.linkedin.com/in/samia-rahman-b7b65216/
FHIR standard cheat sheet: https://www.healthit.gov/topic/standards-technology/standards/fhir-fact-sheets
In this episode, Scott interviewed Samia Rahman, Director of Data and AI Strategy and Architecture at life sciences company Seagen. Samia is helping to lead Seagen’s early data mesh implementation after helping with two implementations at Thoughtworks since the start of 2019.
For Samia, interoperability is about taking information from two systems and combining them to get a higher value. A simple definition but a good one.
Two potential key takeaways:
1) don’t try to plan too much ahead for developing interoperability standards but definitely keep an eye out for places where you could start to develop those standards. And your standards really, really should evolve – you don’t have to nail them right out of the gate.
2) your interoperability will also evolve – you don’t need to make every data product interoperable with every other data product and you can start with basic interoperability first. The more you can standardize around unique identifiers, the better, but it’s okay to not get it right first thing out of the gate.
Samia started her career – and even before in school – focusing on software, especially end-to-end development. A repeating pattern for her has been how crucial contract testing is to getting things into a trustable and scalable state. We’ve had them in hardware and software for a long time and if you don’t have easy testing, those systems often get replaced pretty quickly. Those tests are the safety net to allow for fast and reliable evolution. And that evolution is a key theme for this conversation – set yourself up to iterate and evolve as you learn. Work to not paint yourself in a corner
Data standards, including specifically for interoperability, are everywhere in the life sciences space – FHIR, FDA has lots, etc. but it’s still not great for truly sharing the meaning of the data. FAIR is trying to get there but the interoperability and domain knowledge isn’t really standardized yet.
Samia strongly recommends not getting ahead of yourself on interoperability and standards. It’s perfectly okay to start small – iterate and build on your standards for interoperability, To start have some key identifying “linkers” done. Get things out in front of consumers so they can explore and give feedback and use that to power your iterations. Incrementally building towards a standard is crucial.
If you are going to build a standard, reusability should be your first goal. If it is only for a single use case, that isn’t a standard, it’s just an implementation detail. Samia again recommends contract testing / a schema checker. And definitely leverage existing standards. It’s also not a huge deal if you have more than one standard internally. You don’t need one standard to rule them all.
Per Samia, if you implement versioning, data consumers are usually very willing to work with data producers as they evolve data products. But without versioning, you are just pulling the rug out from underneath them. And right now, there isn’t a lot of good info on versioning data out there, nor tooling. The need to evolve data products is why absolute self-service is probably never possible. The human-in-the-middle is important to help consumers evolve their thinking as the business model evolves.
Samia mentioned the data consumer responsibility to inform data producers – inform them about need changes, issues with their data products, etc. We can’t have data consumers going off and all creating their own fixes to data quality issues, the data producers need to know so they can fix them at the source.
You need to be on the lookout for interoperability opportunities and then validate that there is a need for for interoperability. An important point is that not all data needs to be interoperable.
Samia finished with her interoperability vendor wish list – some kind of tooling that can more easily detect when someone should use an existing standard and that can put those standards in front of data product producers much more easily. How can we make it very easy for data product producers to build in interoperability and leverage existing standards from the start?
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf