#212 Reflections on Building a Data Mesh Platform from Scratch – Interview w/ Jyotshna Karki

Please Rate and Review us on your podcast app of choice!

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

Episode list and links to all available episode transcripts (most interviews from #32 on) here

Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here. You can download their Data Mesh for Dummies e-book (info gated) here.

Jyotshna’s LinkedIn: https://www.linkedin.com/in/jyotshna-karki-81a24038/

In this episode, Scott interviewed Jyotshna Karki, Data Engineer at Novo Nordisk. To be clear, she was only representing her own views on the episode.

Some key takeaways/thoughts from Jyotshna’s point of view:

  1. In data, especially in data engineering, people need to be curious. There are so many new innovations that may really be majorly beneficial. Look to try out more approaches and technologies.
  2. You can have happy data producers and consumers with a centralized data lake setup and still have data mesh be the right evolution. In the long-run, at scale, it isn’t efficient to have a centralized data team coordinating all data use cases.
  3. For many domain teams, the centralized data processing and storage can be a black box. Data goes in, it gets transformed and stored by the central team and then served out. This can create a high dependency on experts and technology.
  4. ?Controversial?: If your domain team consists of their own data engineers and data scientist with domain knowledge experts to manage their own data products, it’s okay to work with multiple teams at the start of a mesh journey. Scott note: if you don’t need to drive buy-in and your org can do this, I don’t see it as a major risk. But probably at most a few hundred (tens maybe even) organizations are like this worldwide.
  5. Don’t try to enable every tool as part of your platform. You should focus and create a good experience on the most widely used tools rather than trying to support every tool available out there.
  6. ?Controversial?: Probably don’t try to automate processes at the proof of concept stage. Wait until the need and impact is greater. But non-automated processes are typically tech debt you should look to pay down when it makes sense, don’t ignore that or it will hurt more at later stage.
  7. Around best practices or things like reusable components and data pipeline blueprints, look to create centralized community sharing mechanisms but with decentralized ownership and contribution. Try to enable that sense of community knowledge sharing and trust.
  8. Create a process to assess if you need to make a change to your platform. What are the business needs and are you meeting them? Constantly look to evolve and improve your platform.
  9. KPIs for your platform are important – it’s a product – but it’s okay to start out pretty simple and use low tech monitoring signals like number of support tickets and customer feedback.
  10. Similarly, try to be more data-driven around building your platform. Even if that data is pretty raw and unsophisticated to start. Scott note: try to stay away from vanity metrics but it’s okay to / you will probably start with vanity metrics until you understand what drives business value from the platform.
  11. Democratizing data, especially doing data catalog well, has led to less data deduplication. Because people know where to find data and can reliably get access again, they don’t copy or build something similar to ensure they have data they need.
  12. “…make sure that it is reliable enough for people to depend on this data.” Trust is crucial, give them visibility to see how data is handled to give them enough trust to really depend on it, not just use it.
  13. Look to community events like hackathons to drive additional experimentation and value. If you make innovation a part of your culture, good things will likely come.

Jyotshna started off the conversation with a bit about her background, especially in data engineering and the need to be and stay curious. There are so many new approaches and technologies that could provide significant benefit to consider. Think in that product mindset and look to evolve your approaches and tech stack to create more value.

Specific to Novo Nordisk’s data mesh journey, Jyotshna and team saw the writing on the wall for their data lake setup. While their centralized data lake was doing well and people were happy with it, there were increasing consumer and producer demands and the central data team was still required to help teams create their data products. Having a centralized team in the middle of every use case just wouldn’t be efficient. Then, they hit some cloud service limits which caused some major headaches as well. All this led to looking to decentralize via data mesh.

At Novo Nordisk, many domains already had significant data capabilities and there were people building data products anyway according to Jyotshna. What they really needed was a way to empower and enable teams to more easily create and manage those data products in an interoperable way and lower the bar. So the central data team was to focus on the platform but there wasn’t a huge need to upskill all the domains. There is still another centralized team of data experts to help domains that aren’t as data fluent. Scott note: while this is not super uncommon, most organizations are not this lucky 😀

Specifically to the pharma industry, Jyotshna shared some of the pre data mesh compliance/regulatory issues that were better addressed with data mesh. Domains needed to work with regulators but it was hard for them to really see exactly how the data was stored as it was managed by the central team, which is part of compliance. It was all part of the central data lake AWS account and those teams didn’t have the ownership or visibility they needed. But with data mesh, the teams now have the visibility to their own data storage and access to audit logs and data governance.

Jyotshna shared that at Novo Nordisk, there was so much demand to participate in their data mesh, the data platform team and any centralized data capabilities – to assist the domains that didn’t have high data fluency – worked with multiple teams to start. This helped them to define the requirements for their data mesh platform to support multiple data domains. While this is a data mesh anti-pattern, it went well for them as many of the domains again were quite capable with data engineering and data analysis. There were also many domains that wanted to contribute aspects to the platform so there were good feedback loops between the platform team and many domains. Scott note: Don’t go this route unless your domains are already highly data fluent/capable. Working with many domains at the start can create a high-risk scenario instead of thin slicing.

Jyotshna and team are focused on enabling proof of concepts more than trying to automate everything right at the start. She noted they are focusing on understanding the problem deeply and moving fast to get proofs of concept into people’s hands and then circling back to automate when there is more need and things are slightly more stable. Basically, they are being agile. It also has led to more modular components and reusability – they can get things out in a prototype phase and then think bigger picture how to deal with similar problems instead of point solutions.

In order to prevent tight coupling and keep modularity, Jyotshna and team started to actually remove things like data pipeline blueprints, reusable components, and bootstrapping accounts from the data mesh platform. While that might feel counterintuitive, they wanted to create a community specifically around things like blueprints so the central team wasn’t managing them, community members were managing them. Look to create central sharing mechanisms but with a decentralized ownership and contribution model. Community-led innovation is more scalable than centralized knowledge ownership.

When thinking about platform maturity and if they need to pay down any tech debt, especially around certain features, Jyotshna and team benchmark quality levels and compare those to the actual business needs. Being in a heavily regulated industry, some aspects of compliance just are non-negotiable, you must meet them. But there are places where ‘good enough for now’ is a completely acceptable and correct answer. Some signals they use are support tickets and direct feedback around different aspects of the platform. They are also starting to build KPIs but it’s a work in process.

One interesting aspect of doing data mesh has been less duplication of work per Jyotshna. This is a target goal of data mesh of course but it came about naturally as now that people can reliably find and access data, they don’t feel a need to build it themselves.

Jyotshna said “make sure that it is reliable enough for people to depend on this data”. Part of your platform and your overall mesh is to make it easy for consumers but also producers to trust the data. If you have a black box process, can producers really trust it? And evolution of your data products plays a part in trust too – a consumer can trust that the way data is presented is still relevant to the business.

Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Leave a Reply

Your email address will not be published. Required fields are marked *