Sign up for Data Mesh Understanding’s free roundtable and introduction programs here: https://landing.datameshunderstanding.com/
Please Rate and Review us on your podcast app of choice!
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Episode list and links to all available episode transcripts here.
Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.
Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.
Mozhgan’s LinkedIn: https://www.linkedin.com/in/tavakolifard/
In this episode, Scott interviewed Mozhgan Tavakolifard, Data and AI Lead for the Nordics at Accenture. To be clear, she was only representing her own views on the episode.
Before we jump in, most of the conversation was about external data marketplaces rather than internal data marketplaces within an organization. It’s also important to note that data marketplace technology and implementations are still in the relatively early stages – it’s quickly evolving and maturing.
Some key takeaways/thoughts from Mozhgan’s point of view:
- Data marketplaces – internal and external marketplaces here – significantly lower the bar to data consumption because of standard metadata and user experiences. You should be able to easily see quality metrics, who owns a data product, access documentation, etc.
- Data marketplaces, when done right, significantly lower the time to value realization for both data producers and consumers/purchasers. And standard quality measurements and metadata make it easy for consumers to understand how much they can trust data to make purchasing decisions easier.
- Practices and tools are emerging for tracking data quality all the way to source to increase the trust data consumers/purchasers can put on data, especially for data marketplaces.
- For external data marketplaces, trust and security are still major pain points. How can data producers trust consumers will protect the data they acquire and use it legally and ethically? What is their risk to consumers behaving improperly?
- ?Controversial?: Mozhgan believes smart contracts and blockchain/distributed ledgers can provide for compliant use by others purchasing data. Some marketplaces are already doing this.
- For data producers, they also want better ways to ensure data consumers/purchasers are only using data in an agreed way so they can charge for any additional use cases. So they will be heavily incented to work with marketplaces that have tracking mechanisms in place.
- “Data ethics is a nightmare,” even before we think about data marketplaces. And that’s not just data bias.
- We can use a number of the techniques and guardrails used to ensure ethics around bias in AI to apply to data marketplace ethics around bias.
- In a data marketplace, ethics falls much more on data producers than most people realize/expect. You should not sell data that can be misused! One way to prevent misuse is to sell insights instead of data itself.
- Look to focus much more on the business returns of data work. Far too much is focused on the value generated without looking at the costs.
- It’s crucial to see organizations as living, breathing ecosystems. Design your organization and ways of working to be able to adapt.
For Mozhgan, data mesh is a perfect fit with data marketplaces as a data marketplace makes it simple for producers to easily share data in a standardized way and consumers to easily find and consume data with standardized metadata and access. Simply put, data marketplaces are the most sensible place and mechanism for sharing data in her view. They significantly lower the barrier to getting access to data and being able to understand data – including how much they can trust data.
So data marketplaces are good for internal data sharing but even better for being able to monetize your data externally according to Mozhgan. Again, the standardization and clear rules about what is allowable use means a faster time from discovery to value for both data producers and consumers/purchasers. Data having clear and concise SLAs means consumers can quickly go from discovery to trusting the data, meaning they can quickly leverage for their own use.
However, major pain points for external data marketplaces are trust and security – for data producers, they must create the trust in their data for others to use it but there is also a big risk to how data consumers/purchasers actually use data producers’ data. Is it compliant/legal use? Is it ethical use? Will those data consumers properly protect the data they consume? If not, what is the risk to the data producer? How can we ensure proper behavior – whatever that may mean to the data producer – by the data consumer/purchaser?
Mozhgan believes blockchain/distributed ledgers might provide a good answer to be able to track compliant usage – are consumers meeting their contractual terms? Smart contracts are supposedly able to track this. However, ethical concerns are still not addressed in smart contracts, at least in a simple and repeatable way. The ways of doing this are still evolving. And she believes we can’t really get to large scale data marketplaces without something like blockchain. Note: Scott is much more skeptical given there are few examples he is aware of where blockchain is really working for trust and security – can you really track usage in someone else’s systems? What about their security capabilities to not have a data breach? Can we actually track ethical use in data?
Another aspect Mozhgan mentioned is that data consumers can only use data they purchase in ways allowed by the contract. Sarita Bakst mentioned this when talking about externally purchased data – data producers want to maximize monetization so data purchasers have to pay for each individual use case. So data producers want to track that consumers/purchasers are actually adhering to that part of the contract. There are a number of recent examples where data sellers will have wildly different prices for the data in PDF form versus an API. The API probably actually costs less to maintain but there’s a strong correlation between consuming via API and getting a lot of value from the data consumed.
When it comes to data consumer trust – can they actually trust the data? – Mozhgan believes we are seeing better ways of tracking data quality all the way up to source. That independent verification is crucial. If data consumers/purchasers understand the exact quality dimensions, that typically makes the data immensely more valuable. Stolen credit card numbers on the dark web go for pennies because you can’t really trust the source for example.
Mozhgan gave a really interesting example of where data marketplaces can take us. Utilities need to monitor trees and proactively trim them where possible so they don’t disrupt powerlines or phone lines. But each utility typically does not have a great information set internally – often from a lack of the amount of data to actually be good at proactive tree trimming. So utilities are trying to get to a place where they can jointly share information with each other to improve their predictions for where to trim. However, a lack of a standard way to share data is really making it quite difficult to actually achieve the desired results. So how can we learn to quickly share information across organizations without a long and complicated process to do things like design a standard data model? Could a marketplace help?
“Data ethics is a nightmare,” even not related to data marketplaces according to Mozhgan. This is not just AI model ethics with bias and the like but there are often unethical ways of presenting the data. Then of course, there are many companies collecting and using data unethically. And we don’t necessarily always want to remove all bias – it may have predictive power. But we need to focus more on the impact of our decisions on the input and output/impact side with data. And she believes we can use a lot of the guardrails we use around AI to ensure ethics in data marketplaces.
Mozhgan recognized that ethics will always be a bit messy when sharing data outside the organization. One suggestion to prevent ethics issues is to only share the insights instead of the actual data used to generate the insights. Or you can share pseudo-anonymized data as well. But at the end of the day, ethics falls much more on data producers than most expect. You have a duty to not sell data that can be misused!
For Mozhgan, there is too much of a focus on the value generated from data work instead of the actual return on investment. This happened in AI with massive hype and it’s happening more in analytics recently – everyone needs to be data driven, right?! You need to create a business case and look at what the expected costs will be for data work. We don’t have really easy paths to predicting exact value but we can get better at that and be realistic about expected costs.
Quick tidbits:
Knowledge graphs will be crucial to sharing data with other organizations and internally for data mesh.
It’s crucial to see organizations as living, breathing ecosystems. Design your organization and ways of working to be able to adapt.
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf