This is the story of how Whatnot uses dbt and Hex to unite data teams and speed data development
The aim: Increase speed of how data is analyzed and pushed to the product by having SQL as the common language across data teams – from analysts to ML engineers.
The challenge: Complex engineering-heavy data structure didn’t incentivize cross-team collaboration and made hiring in a tight market even harder.
“Our previous stack of AWS Glue, EMR, Kinesis, Athena and Airflow required knowledge of complex frameworks and languages,” said Emmanuel Fuentes, Head of Machine Learning & Data Platforms. “In order to scale, we needed a consistent end-to-end approach.”
Results, by the numbers:
4-8x increase in speed from idea to production
1 week for new analytics engineer hires to start shipping
10x decrease in maintenance costs
20x growth in sales last year alone
Data was a foundation at Whatnot from day one. And it had to be. The product is a combination of Twitch and eBay. It’s a modern, mobile-first, influencer-friendly QVC. Data is necessary for the operation of the product and, being both a marketplace and a live video platform at the same time, Whatnot has a lot of data. A lot of very diverse data.
Data is also essential for platform safety (ensuring a healthy experience to everyone during live shows), business metric reporting, and machine learning (to serve the right products, to the right users, at the right time).
In China, livestream shopping has exploded in popularity and is expected to reach $423 billion in sales by 2022. In the US, it’s gaining steam, fast. Last year, Whatnot revenue grew by 20x and it expanded into more product categories and ways of buying. The product was changing fast, data volume was growing at an exponential rate, and they needed more and more qualified data professionals to keep up.
Their data stack however – built wholly on top of AWS services like Glue, EMR, Kinesis, Athena, and Airflow – wasn’t giving them the flexibility and speed they needed. It was an engineering-heavy workflow which required knowledge of complex frameworks and languages.
This had multiple consequences. One of them was that data engineers were spending too much time on non-scalable activities like building custom crawlers and maintaining brittle CI/CD pipelines for Airflow DAGs. And, because this was a complex workflow, the engineers were responsible for the bulk of the work and analysts weren’t delivering value up to their full potential.
For a data-led startup that was scaling, moving, changing and hiring quickly, their data workflow risked hampering their growth.
In order to build the leading live shopping platform in the US, Whatnot needed to build their data-led foundations on a system that enabled them to scale. They required new data tools and processes to increase headcount effectively, maintain the quality of the live shows, expand offerings, optimize algorithms, inform the growing number of stakeholders, and remain agile in the face of fast-paced pivots.
Emmanuel Fuentes, Head of Machine Learning & Data Platforms, was tasked with improving the efficiency of data teams at Whatnot. He had a clear vision: to employ the analyst-friendly SQL as the lingua franca across the whole data organization.
This change would:
Increase the output from analysts and rely less on custom data products built by engineering.
Enable greater collaboration between the different teams within the data organization because everyone would understand each other.
Facilitate the growth of the data team, since SQL is a cornerstone language known by millions of people.
Emmanuel searched the market for SQL-first tools that would allow analysts, machine learning and data engineers to all work together.
“I looked for tools that would allow us to act as if we were 10x size,” said Emmanuel. “I wanted us to all use common features to be aligned on how calculations are made.”
In June 2021, they started to migrate their data warehouse to Snowflake. At the same time, they signed with dbt and Hex. In a quarter and a half, they had shifted everything over to this modern data stack.
Today at Whatnot, dbt is used for modeling and documenting data. Hex is where exploratory analysis happens.
“This workflow is really useful for business reporting, performance marketing, and machine learning.” said Emmanuel. “We could have the same setup we have now on Airflow DAGs. But I’d need a bigger, more specialized team and our maintenance costs would be 10 or 20 times higher."
With analysts, machine learning and data engineers all using SQL and the same tools, collaboration between data teams improved, as did efficiency.
Re-utilizing models
At Whatnot, SQL is used and repurposed across the whole journey from research to presentation (BI) or production.
Analysts and data engineers use Hex as an exploratory internal development environment. They explore models, prepare forecasts, and share findings with each other. Once they confirm the model is solid, they push the logic to dbt. Since dbt and Hex share a native integration, dbt docs and metrics can be accessed on Hex so users don’t need to switch between the two tools.
On dbt, models are cataloged and exposed to their whole data organization. This exposure allows data and machine learning engineers to incorporate existing models further into other data products, algorithms or business reporting—increasing speed from insight to action by up to 8x.
“Analysts, engineers and machine learning are all working on similar problems. When teams use common data models, that enables a lot of efficiency,” said Emmanuel. “Someone might have built something for an analytics report that gave an idea to a machine learning person. Instead of having to rebuild it from scratch, they can reference their dbt model and then move on.”
“This new workflow decreased the delta between an idea and publishing to production by a fourth or eighth,” said Emmanuel. “We’ve gained a lot of efficiencies because we use SQL across our models, transformation layer, analysis, and dashboards.”
A bigger pool for hiring
Another benefit of SQL as a common language is that it has allowed Whatnot to hire from a bigger, more diverse pool of candidates, like bootcamp and non-traditional college graduates.
“When you're growing in a tight hiring market, you need really smart people that can scale very quickly,” said Emmanuel. “A lot more people know SQL than languages like Scala. A SQL-first approach to data pipelines, transformations, QA, and analysis is essential.”
The community aspect of dbt and Hex further boosts the collaborative nature of the SQL-first strategy—leveraging community expertise to answer questions, encouraging continued learning, and even sourcing two new hires from the dbt Community Slack.
By using dbt models and documentation as the base of their data workflow, Whatnot can now bring new external data sources online and make them usable within a few days—as compared to weeks or months before.
“We can skip redoing this layer that people get stuck in – like dimensional modeling, Kimball models or Snowflake schemas – and go straight to business value,” said Emmanuel.
Whatnot’s upfront investment in dbt and Hex to bring engineering workflows to the data team returns compound value to the organization. As they scale and hire faster and faster, the speed at which new team members onboard has a big impact.
“With our new stack, it takes new analytics engineering hires one week to learn our stack. They read the docs, open up the dbt DAG, explore our data in Hex, and are ready to push code to production.”
In the near future, Whatnot is planning to launch new product options, formats, and ways of buying.
However, each different product category has different data requirements, which adds complexity.
“Because each business is so different – sneakers, vintage clothing, Pokemon cards, food and beverage – they all have different data assets being generated,” said Emmanuel. “That results in thousands of fields that we need to be constantly computing and articulating to stakeholders. As you go wide, that just continues to grow.”
Hex and dbt will enable Whatnot’s team to tackle that complexity, from bringing new data sources online in less than a week to expanding their library of reusable data assets for future work.
“dbt and Hex make the data development environment so much easier to work with than any other combination of tools. Since it’s all native, we don’t need to wait for or build a custom adapter,” said Emmanuel. “I can instead focus on scaling my team, and building the best live shopping platform.”
Emmanuel Fuentes, Head of Machine Learning & Data Platforms
Whatnot