#116 A Startup’s Early Journey Towards Decentralizing Data – Iterable’s Analytics Evolution – Interview w/ Riya Singh

Data Mesh Radio Patreon – get access to interviews well before they are released

Episode list and links to all available episode transcripts (most interviews from #32 on) here

Provided as a free resource by DataStax AstraDB; George Trujillo’s contact info: email (george.trujillo@datastax.com) and LinkedIn

Transcript for this episode (link) provided by Starburst. See their Data Mesh Summit recordings here and their great data mesh resource center here

In this episode, Scott interviewed Riya Singh, Business Insights Manager at Iterable.

Some key takeaways/thoughts from Riya’s point of view:

  1. ~4 years ago, Iterable was in essentially “spreadsheet hell” with lots of manual data work and no standard way of storing or sharing data across domains. While domains had good data capabilities, the integration and coordination between domains was very difficult at best.
  2. Most exec questions can’t be answered by the data from a single domain so cross domain data integration became a key factor in Iterable continuing to grow. How could they make crucial decisions informed by data if there was so much manual work to try to integrate ad hoc? Could they really trust something done manually each time?
  3. Fast time to market for simple, base level capabilities of their data platform was much more valuable than trying to nail every feature upfront. Data consumers understood it wasn’t perfect data at the start but it led to much faster exploratory data initiatives which led to valuable insights sooner.
  4. You might have a much higher ROI buying tools than trying to really get by on low-cost but not feature-rich tools. If you build a very cost-efficient data platform that no one wants to use, is that actually valuable? How much time will you spend managing the tools or is it worth it to outsource that to a vendor?
  5. Combining data across sales, marketing, and product meant Iterable could tailor marketing messages and find better prospects, measure marketing return on investment (ROI), and cost optimize their operations and product among many other new insights.
  6. As teams that previously weren’t directly interacting start to have more conversations, gaps in your data – whether in data created/collected or data shared – will emerge. Filling those gaps will mean you can answer more high-value questions to drive the business forward.
  7. At Iterable, when there is a specific use-case identified for cross-domain data integration, the central data team takes over ownership of what would be considered a consumer-aligned data set in data mesh terms. With only 4-5 domains, Iterable doesn’t need to decentralize the data team yet. The cost of decentralizing is far greater than the benefit right now.
  8. Iterable found the most value by doing exploratory data analysis then quickly moving to minimum viable consumable form. Then, they work to continue to improve the data set. This approach means a fast time value by grabbing the low hanging fruit while continually driving to better data and incremental value. But to do this, consumers must be very aware of what they are getting when 🙂
  9. A key way to keep stakeholders informed and bought in is by constantly keeping them updated on progress – Jen Tedrow talked about this in her episode. Keep people informed of progress or ongoing investigations so you can stay coordinated and all parties understand decisions along the way.
  10. At Iterable, conversations between domains are happening weekly. That way, there is always space for people to keep each other updated on upcoming changes or new information. It makes it easy to keep each other informed. Far harder for organizations with many domains but still useful general advice – much like in any good relationship, schedule the time to exchange context!

While Iterable is too early in its life to fully look to implement data mesh, they are taking some major inspirations in how to enable domains to appropriately share data with each other, per Riya. When Riya joined, they were under 100 people and had a very fractured, one-off analytics approach built on a lot of disparate Excel spreadsheets. It was very hard to make data-driven decisions even within many of the domains but especially on cross-domain questions – which are often the most crucial to companies. Each domain had a relatively high data maturity, enough so that the centralized BI/data team thought they could and should still own at least some of the responsibilities to their data.

But, per Riya, to move from a fractured environment, they needed to at least consolidate sharing information into a single place or tool and to start to standardize how information might be shared and combined/integrated across domains. An example Riya gave was how the Product Team data was stored in such a different way than the sales and marketing information but Sales and Marketing both wanted to consume metrics from how customers and prospects were using the service. But again, integration across domains was extremely hard to do. They also needed a standardized ownership model across domains – what would the domains own and what would the BI/data team own?

So, what was the driving factor to push Iterable out of essentially the dreaded “spreadsheet hell”? Riya mentioned that the executives were not able to easily and repeatably/reliably get answers to their questions in a timely manner. Many crucial, exec-level questions are not domain specific and require data from multiple domains. And most exec questions aren’t a single point-in-time question so you need to set up reliable processes to support answering those questions now and into the future. So exec pain at not being able to quickly make decisions backed by data – especially data they could trust – meant there needed to be a change.

Riya gave a few specific examples of big questions they couldn’t answer in the fractured setup. 1) What makes for a good target prospect? You need to combine sales and marketing data about company type, size, etc. with purchase history and combine that with the product usage data to see what types of companies were actually using which features. And then combining in the renewal rate and how much each customer expanded. It could also give them an ability to target pitches based on which features were used by which type of companies. 2) What is the actual ROI on marketing spend? Scott, coming from FP&A, agreed this is an insanely difficult question without really good data. But it’s notoriously difficult without really clean data. 3) What features or business programs should we kill or reduce investment? To cost optimize, you need clean data on what is actually happening with the business and product down to quite granular levels so you aren’t making crucial decisions based on gut.

Per Riya, one very important outcome of the work to combine data across domains: identifying additional gaps in their data and analytics. Were they collecting the information to answer these new questions? If the domain had the information, could they share it? Once basic questions were more or less answered, they could see where they could do better on what information they collected and shared to drive deeper, high-value insights.

When starting to build out the cohesive, company-wide data platform, Riya and team looked at a number of tools. She said they made a few good choices and a few things that she would do differently. They focused a bit too much on trying to provide a really simple UI/UX instead of just getting data sharing and analyzing capabilities in people’s hands and then working to improve from there. They eventually saw really big value in getting people doing initial exploratory work quickly – reducing the time to market of people being able to use even base-level features was very valuable.

They also went with something that made ETL far harder to manage than it should have been in Riya’s view. Sometimes those more expensive offerings like Fivetran will have a much better return on investment – ask yourself how much of your time will be spent managing a system instead of value-add work and if that would be better spent on buying a tool. As Doron Porat mentioned in her episode, it’s rarely a super easy choice but too often, people opt to try to roll their own when it’s more valuable to focus on higher value-add work.

They are now using Snowflake, Fivetran, Looker, and dbt.

Iterable saw a lot of value again from the initial exploratory work between teams driving insights – once they found a good use case, the teams came to Riya and team to build out the data models that could easily combine that data between the domains that would allow for high quality, trustable data in a format that was easy to use. Essentially, once there was a clear use case, the central data team took over ownership to ensure and maintain quality. As seen in many previous interviews, this central ownership model scales until it doesn’t. Right now for Iterable, centralized ownership of what might be considered consumer-aligned datasets in data mesh makes sense. Data mesh, as Zhamak has envisioned it, is not for all companies and a hybrid ownership model like Iterable is using can scale quite well for organizations with not that many domains.

So v0.1-1.0 of Iterable’s data platform was Snowflake + Stitch + Looker, then v2.0 was Snowflake + Fivetran + Looker + dbt. Riya and team are starting to work on v3.0 of their data platform to support some initial data science / ML use cases. What they’ve found is far too much of the central data team’s time is spent on manual tasks so they will be focusing on these new ML use cases as well as building in more automation and optimization.

A few interesting things from Iterable’s approach: 1) by starting with exploratory analysis first, they could discover low hanging fruit insights while working to elevate the data set to production quality. Some things were just obvious in the data even before it was high-quality. Getting to an early initial consumable form and iterating towards higher quality drove value sooner; 2) the domains are constantly in communication with weekly check-ins. This gives them scheduled time to keep each other informed. Easier to do in a company with 4-5 domains but it means fewer surprises and more high-value collaboration; and 3) something that has worked well for Riya’s team is constantly keeping stakeholders updated as work around data progresses. By keeping people in the loop, there is a tighter feedback cycle if expectations aren’t aligned or diverge, meaning far less chance of wasted work.

Riya wrapped up by mentioning how crucial high context conversations really are to making your data strategy work no matter your data management approach. If the domains were just trying to drive their own data, the overall company would be flying blind. So find good ways to keep each other informed and exchanging context!

Riya’s LinkedIn: https://www.linkedin.com/in/riyasingh1/

Data Mesh Radio is hosted by Scott Hirleman. If you want to connect with Scott, reach out to him at community at datameshlearning.com or on LinkedIn: https://www.linkedin.com/in/scotthirleman/

If you want to learn more and/or join the Data Mesh Learning Community, see here: https://datameshlearning.com/community/

If you want to be a guest or give feedback (suggestions for topics, comments, etc.), please see here

All music used this episode was found on PixaBay and was created by (including slight edits by Scott Hirleman): Lesfm, MondayHopes, SergeQuadrado, and/or nevesf

Data Mesh Radio is brought to you as a community resource by DataStax. Check out their high-scale, multi-region database offering (w/ lots of great APIs) and use code DAAP500 for a free $500 credit (apply under “add payment”): AstraDB

Leave a Reply

Your email address will not be published.