#88 Data Engineering and Data Engineers’ Future in Data Mesh – Interview w/ Joe Reis

Data Mesh Radio Patreon – get access to interviews well before they are released

Episode list and links to all available episode transcripts (most interviews from #32 on) here

Provided as a free resource by DataStax AstraDB

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here

In this episode, Scott interviewed Joe Reis, CEO/Co-Founder of data consultancy Ternary Data, Co-Host of the Monday Morning Data Chat, and author of the upcoming book Fundamentals of Data Engineering.

Some key points or takeaways specifically from Joe’s point of view (not necessarily those of the podcast):

  • Find quick, high-value wins. Too often people focus on the big wins and those become overly complicated and end up in failure.
  • Most software engineers don’t understand data well enough to be data product developers in data mesh, at least yet.
  • Data mesh is a polarizing topic. And that makes sense as it is pushing boundaries. Many hope it can come to fruition but it is a bit of a utopian view.
  • The future of data engineering is to move past managing pipelines to much higher-value work.
  • Speed to achieving wins with data – with a clear return on investment and trust – is the first thing you should focus on. Get this right and you can have the “luxury” of building great data products.

Joe started by discussing the kind of nebulous area within software engineering and data that data engineering has always played – sit between the source systems and the data output, converting the data in the source systems into something consumable for data users. Previously, that was mostly about making sure reports got pushed through and you hoped people derived insights. Now it’s more about pipelines. But the way we store information in source systems, it is not in the format or shape we need for analytical purposes. So there needs to be a go-between.

A big trend in data engineering currently for Joe is the abstraction of tooling. Some of that can be good – makes people more productive – or bad – means it’s harder to understand what is actually happening under the covers. But for Joe, it’s probably worth it to use the abstractions as they are able to do the heavy lifting and data engineers can focus on the higher value work. We might be coming to the end of the “pipeline monkey” era of data engineering so we can shift more focus to the data output, DataOps, orchestration, security, etc.

For Joe, the biggest value-add the data engineering team can have is getting wins quickly. When asked about speed to returns versus repeatability, Joe said that the speed is more important, especially when you are trying to prove out the value of your data team. Trust is crucial, so you have to be careful to not move too fast, but trying to do big-bang projects is often a recipe for failure in his view.

When asked what could be the signs an organization is ready to implement data mesh, Joe mentioned that if an organization is already seeing “wins” with data across a number of teams/domains, that’s a very good sign. But you can’t only have a few teams getting those wins as that means the overall organization data maturity is still probably low.

Joe made a good point about how polarizing data mesh can be. When he speaks with some organizations, there are a few leaders who simply reject the idea outright. But many also simply don’t see data mesh as ever being possible specifically in their organization. And that is probably true – low sharing / low empathy organizations need cultural change BEFORE trying to implement data mesh or the implementation will likely fail. Others, including Joe, see data mesh as a bit of a utopia vision – “imagine a world where…” – and that’s pretty common. But Joe made two good points there: 1) if it were a safe concept, it would already be obvious – and so these large change concepts cause concern; and 2) just because we can’t necessarily achieve the ideal, we can strive for a goal e.g. “being a good person” – being a better person is still a win even if you don’t become the best person possible, right?

A recurring theme throughout the conversation was the need for speed relative to data. Identifying and then executing on quick wins is crucial for data teams in Joe’s view. So first, the data teams need to learn how to identify those opportunities to build momentum around the data organization as a profit and innovation center instead of a cost center. As Joe said, “you have to see problems before you can fix them”.

When working with software engineers to teach them data engineering skillsets, Joe has seen the software engineers are often easily able to pick up a lot of the mechanisms used in data engineering, e.g. managing the pipelines themselves. But they have little understanding of what the data consumers want. So it might be too early for most organizations to have their software engineers as the main data product developers for data mesh just yet. And Joe regularly sees that software engineers both don’t understand data, as stated above, but also often don’t care to either. It can be easier to teach data analysts and data scientists data engineering because they understand what data consumers really need.

Wrapping up, Joe again circled back on the need to find high value wins quickly in data. He recommends to not get too complicated, look for the small wins. And to look at what you want to produce and somewhat work backward from there.


Joe’s LinkedIn: https://www.linkedin.com/in/josephreis/

Ternary Data Website: https://www.ternarydata.com/

Monday Morning Data Chat: https://anchor.fm/ternary-data

Joe and Matthew Housley’s interview with Zhamak Dehghani: https://www.linkedin.com/video/event/urn:li:ugcPost:6915063013410582528/

Joe’s upcoming book, “Fundamentals of Data Engineering”: https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/

Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, and/or nevesf

Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB

Leave a Reply

Your email address will not be published.