#115 Understanding the Data Value Chain – Your Key to Deriving Value from Data – Interview w/ Marisa Fish

Data Mesh Radio Patreon – get access to interviews well before they are released

Episode list and links to all available episode transcripts (most interviews from #32 on) here

Provided as a free resource by DataStax AstraDB; George Trujillo’s contact info: email (george.trujillo@datastax.com) and LinkedIn

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here

In this episode, Scott interviewed Marisa Fish, Director of Information Management at American National Bank. To be clear, Marisa was only representing her own views on the episode.

Some key takeaways/thoughts from Marisa’s point of view:

  1. Understanding your data value supply chain – the way you derive and deliver value from your data – should be the crux of data and analytics work. The data value supply chain breaks down into sharing the data itself, sharing analytical insights about the data, and managing the data. All three are crucial to creating value from your data.
  2. Intentionality is crucial – instead of being reactive, stop and ask what are we trying to accomplish and what value will it drive. Then you will focus much more on high value-impact work.
  3. Similarly, think about system engineering work as “mission engineering” – what is your mission in doing your work? Does the work you are prioritizing serve the mission?
  4. When sharing information, start from: what is the point, what am I trying to drive with this information exchange? Are you trying to share one person’s way of thinking or insights or give others the capability to derive their own insights from the new information? Both are very valid and useful but it’s easy to talk past each other if you’re not on the same page.
  5. So much of the way most organizations work with data is about the known knowns – the data consumer knows what data they want and what questions they want to answer with the data. We need to enable people with questions to find the right data to address them and people to also do data spelunking with data they aren’t sure what it might tell them. Look to the Library and Information Sciences space for how to approach that.
  6. We need data librarians, not data publishers. Data publishers are about putting data on the shelf and serving only the known knowns. Data Librarians are there to help people find the information they need to address more of the unknowns – the value of curiosity in driving incremental valuable insights.
  7. There is a major mismatch in most organizations between what insights the business units are producing and the key questions the C-level execs care about. Consider creating a Chief Data Analyst type role to pair with execs to make sure insights are produced to support their initiatives, not just answer their questions as they come up. Think ahead, build ahead.
  8. Data teams need to take far more practices from general engineering – not just software engineering – so we learn how to better understand requirements.
  9. When requirement gathering, expecting the data consumers to know all of their requirements upfront can lead to data consumers asking for the world and a bad mismatch between asks and needs. Look to new ways to exchange information about requirements including the Japanese Obeya technique.
  10. Spend the time to ensure you understand how data consumers will derive value from the information you will share with them. That will give you a better understanding of how best to serve them and what stated requirements might not be quite so required.
  11. Anytime you are sharing data, it’s easy to get bogged down in the 1s and 0s. Ask yourself: what information am I trying to share and why? How is this driving value?
  12. It’s very important for data producers to really dig into use cases to prioritize the work but also to make sure you aren’t over-optimizing or under-delivering on the value. What is the point of the work? And what do the data consumers want: analysis/insights or data?
  13. Data producers/owners are often not willing to openly share all of their data. A big reason is compliance – with internal policies, regulations, etc. So a high context exchange of how a data consumer will actually use the data can lead to more data openly shared – the producer can be assured there won’t be non-compliant use.

Marisa started the conversation sharing that she is hard of hearing and how that has strongly impacted the way she interacts with the world. With often missing certain words in conversations, she – and her brain’s linguistic processing – have to work in a far different way to fully comprehend the meaning of what is being discussed. And we need to think about how we can use a similar approach to data – we won’t always have all the context, how can we apply neuro-linguistic approaches and human data processing frameworks and apply them to data to better expand our understanding of the data?

For Marisa, any time you are sharing information in the form of data, it’s important to understand that it isn’t just machines communicating with each other. In all forms of sharing information and knowledge, you should ask: am I trying to mimic and share one person’s way of thinking or am I trying to augment the way of thinking of the audience? In other words, am I trying to share one person’s understanding or am I trying to give someone else the information to create insights and deepen their own understanding? Both are very valid and helpful but really focus on: what is the actual goal of the information exchange?

Marisa recently moved from a 25+ year career in the US Department of Defense into the financial sector. So she is learning a completely new “language” – actually several. The terminology, the business terms, the ways of exchanging information, the way information and requests flow, etc. From her years of working with very high-impact information exchanges in the DoD – intel drives foreign policy and can put many lives at risk – she understands the cost of data producers and consumers not aligning. So she recommends to start from a conversation of what is the point of this data request or work. And as Jean-Michel Couer said – not in a combative way but it’s crucial to ask.

It’s really crucial to dig into the use case and business need for a few reasons. That context exchange is crucial to driving any data initiative forward including prioritization. The data producers asking the consumers what they are trying to achieve means they have the context to better serve their needs instead of the data consumers having to know every requirement upfront – they have far too many known unknowns about the data. And it also ensures there is a business reason for the ask – how are you, the data consumer, going to derive value for the business from what we plan to share? If it isn’t going to drive value, is there a benefit to doing the work?

Marisa and Scott discussed how difficult it can be to openly share data internally without really knowing what the downstream use cases are. As past guests have noted, without understanding the exact use, domains will not share as much of the data in most cases. Why? Because it’s very easy to get into non-compliant use. And it’s also easy for people to misinterpret and misuse data if you don’t give them the context to truly understand what it means. So to have a more open sharing environment, especially in an industry or organization where data policies are stringent, sharing the context with each other is crucial. Data consumers must share about target use and how they will prevent misuse.

The data value supply chain – which is the core way to think about how you derive and deliver value from data in Marisa’s view – breaks down into three parts: the data itself, the data insights/analytics, and the management of the data. If we look at data mesh with this lens, we are asking domains to at least take on sharing the data and the management of the data – with a lot of help from the platform. But as many guests have noted in the past, data mesh practitioners are somewhat split on how far the mesh data product extends – do you want to prepackage the insights for consumers or package the information up so they can derive their own insights? I think we are coming to an early pattern of it’s both, where possible and valuable. Look to Xavier Gumara Rigol’s episode for an in-depth dive into this.

Marisa shared about her work in the Library and Information Sciences world and how it applies to data – when it comes to exchanging data, so much of what we do currently is about the “known knowns”. The consumer knows what data they want and what questions they want to try to answer. But that is only one of the four quadrants of information. While it can be truly difficult to grasp the unknown unknowns – not knowing what information sources or data you want and what questions you want to answer – the known unknowns and unknown knowns are crucial to expanding our understanding of what is occurring with our organization. Known unknowns is I have questions but not sure what data can help me answer them and unknown knowns is I know I want to analyze and leverage this data but what can it help me answer or what is it telling me.

There is a big difference between a publisher of data and a data librarian in Marisa’s view. And you should look for the data librarians. A publisher of data is someone who just provides the 1s and 0s and not the real information. A data librarian is someone who facilitates finding information. So how can we do that at the organizational level? It’s difficult enough to get people to be that data librarian for their own domain, how can we do that across domains? Does that need to be at the platform or human level? Is it we want both to overlap and serve as much as we can?

Far too many organizations treat the data team as a service-based model, per Marisa. This has been a common theme across many articles over the last few years. A service-based model makes you likely to be run as a cost-center instead of a profit center. So how do you switch that perception? Part of that is doing work that is directly tied to the data value supply chain. For every bit of work, ask: how will value be derived?

But, on moving away from a service-based model, Marisa and Scott agree that there still needs to be someone pairing with C-level executives to make sure there is information gathered and collected to support their key initiatives. A kind of Chief Data Analyst. Rather than ad hoc questions and “I’ll get the information for you”, it’s someone aware of the key initiatives to the company that is ensuring insights are being manufactured to support those initiatives. There is a major mismatch in most organizations between what insights the business units are producing and the key questions the C-level execs care about.

For Marisa, to identify what to do to satisfy a data consumer’s needs, you need to really understand their way of working. Do the operational business process mining to figure out how to best serve them as you work to build out the use case.

Marisa shared her feelings about getting overly focused on the small picture – the micro – and how that plays into the big picture – the macro. If you spend too much time focusing on individual use cases, especially gathering requirements, will you be able to scale up sharing data internally? Possibly look to the Japanese requirements gathering technique called Obeya – or open room. A data producer can lay out a considerable number of possible requirements and the data consumer can opt in to those requirements. And then the data consumer can avoid some of the unknown knowns – what don’t they know about the data? Then, there can be an open exchange about those requirements between data producer and consumer.

In data, so much of the work most organizations do is a direct reaction to a request or question instead of focusing on “how are we going to derive value from this work?”, per Marisa. We need to take practices from engineering – not just software engineering – on requirement gathering and understanding. How many people in the data space have really trained in proper requirement gathering? Is this mismatch between gathered asks and what the project actually needs to succeed the reason why so many data initiatives fail? Stop simply reacting to requests, ask why this matters, what value will it drive. Scott’s favorite data mesh word: intentionality. Think of it as “mission engineering” – why are you doing this? What is our mission and does this support our mission?

Per Marisa, when we as humans collect information ourselves – not in our systems – we mostly do that through hearing and vision. So, how do we think about our electronic information collection systems? Can we get our systems to better mimic the way humans collect and process information? Should we try to mimic them exactly since humans have inherent bias? What can we learn from the way humans collect and process information and then improve upon those? What is the science of intelligence and how can that impact the way we build systems? Our brains execute a series of programs for information processing. How can we leverage multiple frameworks to do the same?

Marisa’s LinkedIn: https://www.linkedin.com/in/marisafish/

Obeya method: https://obeya-association.com/what-is-an-obeya/

MIT course on “The Science of Intelligence”: https://cbmm.mit.edu/education/courses/science-intelligence

John Duncan paper on brains executing series of programs: https://web.mit.edu/9.s915/www/classes/duncan.pdf

Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, and/or nevesf

Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB

Leave a Reply

Your email address will not be published.