In this episode, Jesse Paquette, Chief Science Officer and Co-founder at Tag.bio – a data platform vendor in the life sciences space, and Scott dive a bit deeper into data quality in general, especially data testing and versioning.
You can see the LinkedIn post that sparked this discussion here
Jesse recommends a number of things to ensure data quality, especially data testing and versioning. This includes versioning of 1) the code used to create the data (generally the ETL code), 2 the schema, 3) the business logic layer, and 4) timestamping / temporality based versioning.
Jesse’s general calls to action are 1) make data testing frameworks so testing is much less tedious and time consuming; 2) work with stakeholders to gain trust in the data and then continue the dialogue to keep said trust; and 3) create schema/domain model blueprints so that domains have a starting point – whether they use it is irrelevant but shortening the path to a working domain model is crucial.
Jesse’s contact info:
Email: jesse at tag.bio
Twitter: @bzdyelnik / https://twitter.com/bzdyelnik
Tag.bio vendor interview for Data Mesh Learning: https://www.youtube.com/watch?v=acQADu7ttqQ