Sign up for Data Mesh Understanding’s free roundtable and introduction programs here: https://landing.datameshunderstanding.com/
Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies (info gated) here.
Elena’s LinkedIn: https://www.linkedin.com/in/elenasamuylova/
Evidently AI on GitHub: https://github.com/evidentlyai/evidently
Evidently AI Blog: https://evidentlyai.com/blog
In this episode, Scott interviewed Elena Samuylova, Co-Founder and CEO at the ML model monitoring company – and open source project – Evidently AI.
This write-up is quite a bit different from other recent episode write-ups. Scott has added a lot of color on not just what was said but how it could apply to data and analytics work, especially for data mesh.
Some key takeaways/thoughts this time specifically from Scott’s point of view:
- A good rule of software that applies to ML and data, especially mesh data products: “If you build it, it will break.” Set yourself up to react to that.
- Maintenance may not be “sexy” but it’s probably the most crucial aspect of ML and data in general. It’s very easy to create a data asset and move on. But doing the work to maintain is really treating things like a product.
- ML models are inherently expected to degrade. When they degrade – for a number of reasons – they must be retrained or replaced. Similarly, on the mesh data product side, we need to think about monitoring for degradation to figure out if they are still valuable or how to increase value.
- Data drift – changes in the information input into your model, e.g. a new prospect base – can cause a model to not perform well, especially against this new segment of prospects. That data drift detection could actually be a very useful insight to pass on as an insight – has something changed with our demographics? If so, what? When? Do we know why?
- Concept drift – the real world has changed so your model is not performing as expected – is a crucial concept in data and analytics too. Are we still sharing information about the things that matter? In a way that is understandable? Are we encapsulating what’s happening in the real world in our mesh data products?
- Concept drift feels similar to semantic drift in the analytics world. So we can look to potentially take deeper learnings from how people approach and combat concept drift from ML and apply it to data mesh.
- How can we monitor degradation in mesh data products and prevent that degradation our data and analytics work? Historically, reports drifted further and further from reality with no intervention because the pain of change was so high. Are we fully reliant on the domain to know? Can we use software to help us detect semantic drift? Very early days on that one.
- ML models are designed to do one thing very well. Unfortunately, we don’t have a good framework for reuse at the model level in ML. Maybe at the ML feature level?
- ML models have expected performance metrics. Those expectations need to be set through conversations between the business team and the ML team. Measure using KPIs. Can we use a similar approach to expectations – at least for some specific use cases – for a mesh data product?
- When building an ML model, you need to consider scope, business purpose, expectations, measurement against expectations, etc. Similarly, when doing any data work, you should consider the same. It is somewhat hard to measure the impact of most mesh data products but it doesn’t mean you shouldn’t try. What are you trying to achieve with the data product and is it meeting those expectations? Is the business need still relevant or has it changed?
- Regarding graceful evolution and preventing breakages due to changes in sources or downstream breakages from changing the ML model and/or its outputs, ML unfortunately does not have any answers that we aren’t already using on the data and analytics side. Good communication, contracts, monitoring/observability, etc. No silver bullet or MLMFD – ML Magic Fairy Dust.
- The concept of a feature in ML – a smaller component of the model that might be reusable across multiple models – could be interesting to consider in data mesh. It would likely break with Zhamak’s view of each data product owning it’s own transformation logic but could create almost proto-transformed data. Almost like a service bus to easily serve data products. Probably has lots of drawbacks but interesting to consider.
- Guardrails on ML models help to keep the models from doing things like reacting to data that is out of the norm. As Elena said, if an ML-based recommendation on a website is a bit off, the conversion rate falls but that’s not the end of the world. But what if you are dealing with big dollar decisions? Should we look to proactively put in guardrails into our data products? Probably yes, if they are driving crucial decisions – consider failure modes and what to do in those cases.
- Getting to fast incremental value is crucial when developing ML models. There needs to be very good trust and communication so people understand the initial quality level might not be great as you iterate towards a better model – or mesh data product. This is becoming a common theme – how can you release a v0.1 or v0.0.1 of your mesh data product and still drive value now while getting it to v1.0?
Elena started by sharing a basic definition of the concept of drift in ML. Drift causes model degradation -so the model is not as effective as expected – and can be generally split into data drift or concept drift. Data drift is typically something about the source of data you’re using for your model has changed. That doesn’t mean using a new source, more like you are interacting with a different set of prospects or customers than you were previously so your predictions as to their behavior are going to be wrong – you built a model to react to a different set of people. Concept drift is more aligned – at a very high level – with semantic drift in data and analytics – it is that some aspect of the real world has changed. If you look at spending habits, especially in ecommerce, between February 2020 and April 2020, as the global pandemic started to take off, the real world changed a LOT. That was an extreme example but the real world is ever changing – how can we make sure we are still measuring and sharing the most meaningful information in our mesh data products?
A very important aspect of ML model drift, per Elena, is that it is entirely expected. Drift, and it’s resultant model degradation, is part of ML model reality. There is a cost of dealing with drift but when an ML model is negatively impacted, it is no longer making optimal decisions. So when you detect said drift to a certain degree, you would retrain the model, or shut it down, or replace it with a new model. It’s also hard to say how long a model will be in production before it seriously degrades. Or what is the degradation threshold when you should retrain or replace. Similarly in data mesh we need to think about how we evolve our data products to prevent degradation. ML models are purpose-built to do one thing but start to degrade over time. Often in data and analytics, we’ve used data assets the same way – we kept using the same reports as they degrade but don’t replace or evolve them. We need to do better in data mesh.
According to Elena, good ML practice means each model is designed to do one thing very well, not to do many things instead. There is sometimes misuse of ML models in organizations as people try to make use of the same model for multiple use cases. This is similar to the way a number of people use data assets – created to answer one question but leveraged to try to answer another. If there isn’t a good understanding of exactly what the data asset addresses and how, it often leads to bad/incorrect conclusions on answering other questions.
So, how do we measure if an ML model has degraded? And how do we fix it if it has? Per Elena, you should measure your model against a certain set of expectations, typically via KPIs. If the model is no longer hitting expectations, it has likely degraded. Then you would look to retrain it – use the same steps as before to train your model but with the most recent data – or replace it.
Elena believes the most important aspect of building an ML model is communication first. What are you trying to actually do? What is the business reason for creating a model? When the model is created, what are reasonable – and how can we stay away from unreasonable – expectations? What are the business metrics to create the KPIs around? How will you track performance against expectations/KPIs? These same types of questions can be applied to a mesh data product. Why are you creating the data product? What is the target use case and what is the expectation for the use case? Is the use case meeting expectations? If not, is that because of the data product or the use case itself?
Scott asked Elena about graceful evolution of ML models – how can we set ourselves up to deal with upstream changes more easily and how can we manage to not break things for downstream consumers. Her answer was unsurprisingly familiar: lots of good communication, using (data) contracts, using monitoring/observability tooling, set guardrails, etc. Similar to the data mesh concept, Elena believes you should really think of each model like a product.
When asked how the ML and analytics sides of the house can better collaborate, Elena hopes that in many organizations, they aren’t overly separate. Embedded ML engineers are similar to the embedded data and analytics capabilities/teams model many are using with data mesh. And she hopes once the super fast pace evolution of data stacks slows down, maybe both sides can start to consider using the same tooling. But the biggest driver will be good communication – it pretty much always comes down to communication…
As mentioned earlier, Elena strongly believes you should not try to use the same model for multiple purposes. But in ML, there is the concept of a feature. Essentially, it is a part of the model that might be used for multiple different models, producing a subset of the model’s data input. So a feature could be reused across many ML models. This feature concept might be interesting to explore in a kind of proto mesh data product way. That way, we prevent multiple data products from doing the same work. An ideal way to prevent this reuse is communication – as Omar Khawaja discussed in his episode – but a company-wide source data catalog or repository could be a way to ensure everyone knows what data is being transformed and who owns it. Especially when new data products are in development so there is a much smaller chance of teams doing the same work.
We can learn a lot from ML monitoring/observability, per Elena. In ML, you need to monitor the overall ongoing quality of data ingested into the model, the quality of the output of the model, and also the quality at the point of ingestion. Often, that last part – quality at point of consumption – is managed by guardrails. If data is not within a certain specification, it is not passed into the model. Or the model doesn’t react outside of certain bounds. Or if some metrics about the model relative to historical norms are very off, the model essentially gets paused and there is a failover to a less rigorous solution. Sometimes, passing bad data into the model is not the worst outcome – your Amazon recommendation is for buying another toilet seat… how many of these do you think I want Amazon? But ML models can power very big-dollar decisions. And these guardrails could be very useful in data mesh if you are driving decisions with a fast turnaround. Alert that something is unique and see is there a new normal or was there something funky upstream.
Elena mentioned ML development sometimes has a perfectionist issue: people try to get to a perfect model before deploying something instead of getting to fast value – putting something into production that incrementally increases value quickly – while you improve your model through quick iteration and tight feedback loops. This is becoming a very common theme in many interviews: how do we get to incremental value very quickly while we improve the long-term mesh data product through fast iteration?
Elena wrapped up on two thoughts: 1) Maintenance of your ML models isn’t “sexy” but it’s probably the most important aspect. Maintenance is proactive maintenance, monitoring, setting up good feedback channels, communication in general, etc. It’s not just the model in a vacuum – is it having the impact you expect. And be prepared to pay for the maintenance. Which plays into 2) “if you build it, it will break.” Set yourself up to detect the issues and make sure you budget people’s time to keep things running and fix it when it breaks. And don’t be surprised when it breaks.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here