Data Mesh Radio Patreon – get access to interviews well before they are released
Episode list and links to all available episode transcripts (most interviews from #32 on) here
Provided as a free resource by DataStax AstraDB
In this episode, Scott interviewed Jesse Anderson, Managing Director at consulting company Big Data Institute, host of the Data Dream Team podcast, and author of 3 books, most recently Data Teams.
To start, a few takeaways from Jesse’s perspective on the choosing technology side:
- You should make sure you have the right team in place to make good technology decisions – the team needs to be in place first
- Before selecting any technology, it’s crucial to understand what you are trying to accomplish. And to understand that the technology will provide help in addressing the challenge but won’t solve anything itself
- Focus on: is this the right tool or solution for us now and in the future? What is the roadmap and vibrancy of the solution?
- “Technology must earn its keep”, meaning you should understand the total cost of ownership and what is your expected return on investment
- Data tooling cycles are probably going to be 10 years at the most – prepare for obsolescence so you aren’t overly reliant on any one technology
And some takeaways from Jesse’s point of view on decentralizing data teams:
- Currently, software engineers aren’t ready to be data product developers so you’d need embedded data engineers to handle creating and maintaining data products in data mesh
- But many data engineers are not willing to be embedded into domains
- Managing the dotted line versus solid line of reporting between a functional team and the domain is very difficult
- There are a number of cracks where crucial data can fall into and fail to find a good owner in a decentralized structure, especially aggregate data products
Jesse started the conversation on how important people are to getting things right with data, especially making technology decisions. The chicken and egg question is do you need to have the right people in place first or do you want to make technology decisions that will attract people. In Jesse’s view, you need the right people in place first as they will be the ones to make the right decisions on technology selection.
The most important question for Jesse when selecting technology is what are you trying to accomplish with technology. If you don’t focus on the target outcome, that is not going to work out well. And you should know, in general, what most of your use cases will be for the technology – use that to assess what is the right technology to choose.
Also, for Jesse, “technology must earn its keep”. Just because you made a decision on using that technology at one point, it must continue to be of more value than its cost. And you want to strongly factor in your long-term total costs, as best as you can estimate then, when looking at adding a technology. This is important for build versus buy, can you continue to keep something running, is the long-term roadmap a match to your goals and vision, etc.
Jesse also pointed to how different data is to the operational side relative to technology cycles. Considering Hadoop, where Jesse focused in his time at Cloudera, 10 years – or even less – is realistic for how long data technologies might be around. Thinking in those cycles, you should think about where a technology is and where it is headed when choosing: what is the chance of obsolescence? How healthy is the project? You must have a longer-term vision, more than just does it solve our today problems.
You should consider how aggressive you will be in tech adoption, per Jesse. Will you be comfortable with making early bets? How can you set yourself up to be able to migrate away once technology is no longer a great fit for you? Data mesh can make it easier to wean off a technology as what you expose to data producers and data consumers is rarely the underlying tech instead of an interface.
Jesse talked about how right now, general software engineers / application developers are not ready or able to create good data products. One big issue is a lack of understanding about schema changes – on the one hand, you can’t tell software engineers they can’t make schema changes because that blocks application development but on the other, most software engineers do not understand the downstream impact of those schema changes. They are also, per Jesse, not well versed enough in how to store and share data about the domain to 1) maximize reuse and 2) create datasets that will be useful for analytics.
Aggregated domain ownership is one issue Jesse pointed to regarding decentralization of data teams – who owns these products? Do they need to be products? Another aspect is something that’s run through many conversations on the podcast – if we give domains the authority to do whatever they want, won’t that cause chaos? Probably. So establishing best practices and giving people a common platform to use and reusable frameworks is necessary to make something like data mesh work.
Another issue with team decentralization Jesse has is how to manage the career growth and happiness of data engineers. Many data engineers may not want to be embedded in domains. And do they follow best practices of the organization or, if the domain owner says do something quickly and not adhering to best practices, who do they listen to?
Jesse finished by saying all your data work should have a purpose. Every organization should ask if the data mesh is truly worth it for them, both now and in the future. It’s okay to say not now. It’s okay to say not ever.
Jesse’s Data Teams Book: https://www.amazon.com/Data-Teams-Management-Successful-Data-Focused-ebook/dp/B08JLFTPBV
Big Data Institute website: https://www.bigdatainstitute.io/
Data Dream Team podcast: https://sodapodcast.libsyn.com/site
Jesse’s LinkedIn: https://www.linkedin.com/in/jessetanderson/
Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/
If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/
If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here
Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB