#278 Data Contracts for the Rest of Us – Approaching Contracts in Evolving Companies – Interview w/ Ryan Collingwood

Please Rate and Review us on your podcast app of choice!

Get involved with Data Mesh Understanding’s free community roundtables and introductions: https://landing.datameshunderstanding.com/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

Episode list and links to all available episode transcripts here.

Provided as a free resource by Data Mesh Understanding. Get in touch with Scott on LinkedIn.

Transcript for this episode (link) provided by Starburst. You can download their Data Products for Dummies e-book (info-gated) here and their Data Mesh for Dummies e-book (info gated) here.

Ryan’s LinkedIn: https://www.linkedin.com/in/ryancollingwood/

In this episode, Scott interviewed Ryan Collingwood, Head of Data and Analytics at OrotonGroup. To be clear, he was only representing his own views on the episode.

Some key takeaways/thoughts from Ryan’s point of view:

  1. Have empathy for yourselves and others in all things you do around data. You won’t always get it right the first time. Build the relationships, build the trust to continually drive and iterate towards better.
  2. In tech, far too often we hear what people need and provide a poor solution to actually solving their needs. It’s focusing on the tech instead of the people.
  3. Far too many technical solutions/approaches – e.g. data mesh, data contracts, etc. – are really presented for tech-heavy/forward companies e.g. startups. Most companies, large or small, are not capable to leverage the approaches as presented so they must be adapted for ‘the rest of us’ companies. Scott note: data mesh is like this
  4. Far too often, these tech approaches focus purely on the tech instead of the people. That’s partially because every org has a different culture so you can’t cover them all; but if you only follow the approach as presented instead of focus on the people/ways of working in your org, it’s far less likely to go well. You’ve implemented a great technical solution that no wants to or can use.
  5. ?Controversial?: “What are the trade-offs that I can make, while still being true to the value and the benefits that I want to get out of this?” Scott note: SO important to consider when looking at any technical pattern/approach. What is true to the value of the approach?
  6. Data contracts really rely on 3 things: at least two parties, an agreement of some kind that is recorded, and access to data that conforms to that agreement. You can add value building beyond those 3 but you have to start somewhere and you can deliver value with something that only satisfies those 3.
  7. ?Controversial?: It’s hard not to have a sense of imposter syndrome when you actually strip a concept down and implement something that doesn’t look like the public examples. That’s okay and to be expected. If you’re delivering value reliably, you’re probably doing something right ๐Ÿ˜…
  8. The world changes all the time. Your systems will change. Your data sources will change. Your understanding of the world will change. Your processes will change. Create/use approaches that can handle change or you’re just going to create more headaches down the road.
  9. With a centralized data team, the data team is often considered the data producer, at least to the consumer in a data contract. So with strong testing, the data team can be far more sure about meeting their contracts with consumers.
  10. Express data quality “in a way that engages people who are not you.” Make it understandable. It doesn’t have to be – and shouldn’t be – rocket science.
  11. Data contracts should be the culmination of multiple conversations. Because then the producer isn’t just posting data, they understand the needs of the consumer(s) and can best serve them.
  12. It can be incredibly helpful to just go talk to your business colleagues about data in their applications or that they use in the warehouse. They can explain the context of what is going on in the real world but you can show them how that is represented in data. Both people gain a lot of understanding.
  13. Similar to data mesh, most people in your organization won’t care you are specifically doing data contracts. Talk to them about what changes and why you are doing it and how it will deliver value. Speak in the language of the business, meet them where they are.
  14. Also similar to data mesh, you don’t need to convert the whole company upfront. Find an ally to test at a small scale and prove value. Use that to learn, get a champion, and show value to gain more converts.
  15. Some data isn’t worth cleaning if it fails your contracts testing. Really consider what is of value and negotiate that with your consumers. They may want 6 years of data but really only 6 months or weeks is of considerable value.
  16. What is the risk, what is the fear of a piece of data being wrong? Really assess that. If it’s relatively close, especially for now, will that be good enough? People need to consider that data is never 100% accurate so what is good enough and how comfortable are they with uncertainty.
  17. ?Controversial?: Relatedly, signal is what matters, not (usually) exact measurement. Get people used to finding the signal but also get them used to understanding how reliable that signal is and then acting on it to an appropriate degree.
  18. If something you are measuring, some data point, isn’t going to cause action no matter the result, why measure it?

Ryan started off with some framing of how he looks at tech approaches in general but especially how he started looking at data contracts. Most paradigms are presented as if every organization is very tech-y, like a tech startup. With data contracts, much of the content “โ€ฆthere was this assumption that you had multiple teams of people that had a fairly high degree of technical sophistication, โ€ฆ or maybe even data was their primary focus.” So when a less tech-y company wants to leverage the paradigm, there is always some adjustments necessary ๐Ÿ˜… and when it comes to those types of companies, itโ€™s so much more about the people than the way most paradigms are presented. It makes some sense because every org’s ways of working and culture are different but it still can feel very removed from reality for less tech-heavy companies.

When focusing specifically on data contracts, Ryan’s company is far more batch than streaming. So trying to even leverage the best advice (Scott note: I highly recommend Andrew Jones for that), he had to adjust some aspects to a world where things were a bit more messy and with teams that aren’t as data mature. When approaching how to tweak data contracts to still work, he asked the rhetorical but crucial question: “What are the trade-offs that I can make, while still being true to the value and the benefits that I want to get out of this?”

Ryan moved into what he sees as the minimum viable value aspects of data contracts. You need two parties, you need an agreement of some kind that is recorded, and you need access to data that conforms to the agreement*. As to the parts of the agreement, Ryan focused on two factors at the start: semantics and data quality. If people can’t understand the data can they use it? If they don’t understand the quality, can they really trust it enough to rely on it? So they worked to create a data dictionary and also provide people a better understanding of the different angles on data quality.

* Scott note: this could somewhat disagree with the idea many have around data contracts of merely publishing data with SLAs because while there is a consuming party, they aren’t really part of the agreement, they only choose to use the data based on the existing SLAs/contract around it. There’s lots of nuance but I HIGHLY believe in the communication-heavy aspect Ryan and Andrew Jones both present.

Often, when comparing with what was presented for a tech-heavy company to what is possible at a more regular organization can be disheartening according to Ryan. The idea that the end picture at your organization should look like the one presented is pervasive. So it’s not only hard to adapt the approach but then you wonder if you even captured the value ๐Ÿ˜… Can you even call it ‘data contracts’ or whatever you are working on?! Imposter syndrome is very common here. Scott note: you could definitely call what Ryan and team are doing data contracts ๐Ÿ™‚

Ryan also talked about how in data contracts, you must build for change. Change is the only constant after all. So creating systems that don’t handle change well is a great way to manufacture more headaches down the road. Much like in software testing, you can more easily tell when something no longer works and needs to be changed. And when the data team is the actual data producer – if the data team are the ones transforming the data, that’s often the case or at least is the only group of people consumers talk to with a centralized data team – they are much more sure that what they are doing is correct.

Another key learning Ryan had along the journey was that when displaying data quality, make the metrics more easy to understand to the layperson. Historically, data quality has been measured with complex statistics. Most people can’t easily read the charts from that to understand what’s going on. Make the data quality metrics understandable so people can see progress but also get a sense of how well they can rely on data. It is a sad truth that you can deliver value but if you can’t get others to see that value, it isn’t valued. Showing that value gets people to lean in.

Ryan dug a bit deeper into creating systems that act with empathy. If you approach data contracts as consumers only get what the producer shares, that doesn’t end up serving the end needs that well. But if you are treating the contracts as the culmination of multiple conversations, the producer can start to really understand the impact of bad data. How much work do data consumers have to do to actually use the data? This is where empathy and product thinking come in.

“โ€ฆdata, as we know, it is merely a side effect of activity, of stuff happening.” Ryan believes we need to move past the 1s and 0s thinking in data and focus on what it reflects and how that impacts the people in the organization. Conversations can be hard but they give you the context necessary to maximize the impact of your deep systems work. Talking with people can help both parties bridge the gap between understanding what is happening in the real world versus the data ๐Ÿ˜…

Internally in Ryan’s org, they wanted to review their general processes. Part of that was the uncomfortable truth that change, especially to processes, impacts the data. So that review created a great opportunity to start to implement data contracts. It wasn’t about telling people they were doing data contracts, it was about getting people bought in to what value could be delivered if they did data quality and trust better. It just happened to be via data contracts.

When actually starting out, Ryan looked for one ally that was willing to take on some of the complexity of dealing with data contracts and saw the potential benefits. Instead of trying to convert the whole organization, it was contained and let Ryan learn how to implement data contracts well in his specific organization. That initial success gave him the confidence to move further and the success story to entice additional partners/allies.

Ryan discussed the push and pull of data quality and value. While it might be valuable to have a long history of data, is the cleanup worth it? Really have conversations and make hard choices that align to return on investment instead of merely do consumers want it. Similarly, people need to confront the idea of data being right or wrong. They need to consider what is the cost of some data being wrong, especially slightly off. If that’s for a regulator, potentially high. But if it’s your weekly marketing leads report and it’s off by 0.2%, how big of a deal is that? And how much trust is lost if it’s wrong? Can we get people to understand data is never 100% clean/right? Getting people to act on signals will likely be somewhat challenging but it’s a better way to navigate than trying to wait for exact measurement in many – most? – cases.

Ryan wrapped up back on dealing with yourself and others with empathy. You might not get it right at first but if there’s trust, you can iterate towards better together. That goes for your data, your processes, and your relationships.

Learn more about Data Mesh Understanding: https://datameshunderstanding.com/about

Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Leave a Reply

Your email address will not be published. Required fields are marked *