Ensuring Data Quality via Data Testing and Versioning – Interview w/ Jesse Paquette

Provided as a free resource by DataStax AstraDB

In this episode, Jesse Paquette, Chief Science Officer and Co-founder at Tag.bio – a data platform vendor in the life sciences space, and Scott dive a bit deeper into data quality in general, especially data testing and versioning.

You can see the LinkedIn post that sparked this discussion here

Jesse recommends a number of things to ensure data quality, especially data testing and versioning. This includes versioning of 1) the code used to create the data (generally the ETL code), 2 the schema, 3) the business logic layer, and 4) timestamping / temporality based versioning.

Jesse’s general calls to action are 1) make data testing frameworks so testing is much less tedious and time consuming; 2) work with stakeholders to gain trust in the data and then continue the dialogue to keep said trust; and 3) create schema/domain model blueprints so that domains have a starting point – whether they use it is irrelevant but shortening the path to a working domain model is crucial.

Jesse’s contact info:

Email: jesse at tag.bio

LinkedIn: https://www.linkedin.com/in/jessepaquette/

Twitter: @bzdyelnik / https://twitter.com/bzdyelnik

Website: https://tag.bio/

Tag.bio vendor interview for Data Mesh Learning: https://www.youtube.com/watch?v=acQADu7ttqQ

Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode created by Lesfm (intro includes slight edits by Scott Hirleman): https://pixabay.com/users/lesfm-22579021/

Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB

Leave a Reply

Your email address will not be published. Required fields are marked *