#43 Applying Resilience Engineering Practices to Scale Data Sharing – Interview w/ Tim Tischler

Sign up for Data Mesh Understanding’s free roundtable and introduction programs here: https://landing.datameshunderstanding.com/

Please Rate and Review us on your podcast app of choice!

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

Episode list and links to all available episode transcripts here.

Provided as a free resource by Data Mesh Understanding / Scott Hirleman. Get in touch with Scott on LinkedIn if you want to chat data mesh.

Books/posts/papers mentioned:

Blameless PostMortems and a Just Culture by John Allspaw – Link

The Theory of Graceful Extensibility: Basic rules that govern adaptive systems by David D Woods – Link

The Field Guide to Understanding ‘Human Error’ by Sidney Dekker – Link

In this episode, Scott interviewed Tim Tischler, Principal Engineer at Wayfair. Prior to Wayfair, Tim worked as a Site Reliability Champion at New Relic and is well known in the “human factors” and resilience engineering space.

Per Tim, our current work culture is overly action-item driven – every meeting must have a set of agenda items generated from it. This prevents people from having learning-focused meetings exclusively designed for context sharing. Humans’ brains work differently between learning and fixing mode and we ask totally different questions. To be able to scale our knowledge sharing, we need to have the space to have learning-focused meetings.

A good way to center learning-focused meetings, be they “show and tell” or event storming sessions, is via sharing stories – human communication is founded on story sharing through the millennia. Tim’s “show and tell” and event storming sessions at Wayfair have had extremely positive reviews so far.

Tim sees ticket-based interactions – just throwing requirements on someone’s JIRA backlog or similar – as fundamentally flawed. If Team A gives Team B requirements, Team B just looks to close the ticket versus getting both sides in the room to exchange context and have a negotiation. Tim prefers two modes of interactions over ticket systems: #1 – no human-touch, automated interactions, e.g. an API; and #2 – high touch, high context sharing interactions.

For resilience engineering specifically, you should apply learnings to each data product AND the mesh as a whole. Part of that is a broad acceptance that you are in a highly dynamic and highly changing org – there will be changes! A few anti-patterns to resilience engineering that apply to data mesh are: 1) a hub and spoke relationship model where one person is the key glue – this is bad at a human level and even worse at a technical level :); 2) business leaders pushing for metrics without sharing the specific context as the results end up as completely empty and useless things you are tracking; and 3) not embedding people building platforms into the teams they are building the platform for – they must really understand the workflows.

Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, ItsWatR, Lexin_Music, and/or nevesf

Leave a Reply

Your email address will not be published. Required fields are marked *