Datacast

Episode 120: Next-Generation Experimentation, Statistics Engineering, and The Modern Growth Stack with Chetan Sharma

Episode Summary

Chetan Sharma is the Founder & CEO of Eppo, a next-gen A/B experimentation platform that is designed to spur entrepreneurial culture. As the 4th data scientist at Airbnb and an early data scientist at companies like Webflow, Chetan has been focused on the maturity curve of growth-stage companies and how to establish data as a central stakeholder in decision-making. He previously led the team that developed Airbnb's knowledge repo and has led data teams focused on production machine learning and instrumentation integrity.

Episode Notes

Show Notes

(01:44) Chetan reflected on his undergraduate experience at Stanford studying Electrical Engineering and Statistics back in the late 2000s.
(06:10) Chetan recalled his experience interning at IBM and Quantcast and doing research at Stanford Center for Minds, Brain, and Computation.
(08:41) Chetan talked about his first job working as a research analyst focused on healthcare policy at Acumen.
(11:15) Chetan walked through his decision to join Airbnb as their 4th data scientist and work on building Airbnb's original ETL framework for online risk mitigation.
(15:12) Chetan recalled the early state of data science at Airbnb.
(18:17) Chetan touched on the development of Airbnb's knowledge management and sharing platform called Knowledge Repo.
(23:10) Chetan explained why an experimentation program is the most impactful thing a data team can do.
(26:06) Chetan walked through the evolution of Airbnb's experimentation platform since its inception in 2014.
(31:24) Chetan recalled fond memories from taking a year off from work to travel.
(35:16) Chetan touched on his transition back to work by way of living in Atlanta and co-founding a logistics software startup called Saltbox.
(39:28) Chetan described his time as a data scientist at Webflow, building their experimentation system from scratch.
(42:48) Chetan shared the story behind the founding of Eppo.
(46:12) Chetan dissected the key capabilities that are baked into the Eppo product.
(48:32) Chetan dived deeper into the problems caused by long experiment durations and the benefits of using CUPED to bend time in experiments.
(52:04) Chetan talked about the role of a statistics engineer.
(54:45) Chetan shared his perspective on the role of experimentation in the Modern Data Stack and the Modern Growth Stack.
(01:00:47) Chetan discussed the core elements of the modern experimentation stack.
(01:04:54) Chetan talked about the experiment overhead.
(01:06:24) Chetan emphasized the designer gap in experimentation tools
(01:08:50) Chetan shared his thoughts about metric strategy.
(01:10:51) Chetan shared valuable hiring lessons to attract the right people who are excited about Eppo's mission.
(01:14:10) Chetan provided his perspectives on finding design partners for an early-stage startup.
(01:17:04) Chetan shared fundraising advice to founders who are seeking the right investors for their startups.
(01:19:40) Closing segment.

Chetan's Contact Info

Eppo's Resources

Mentioned Content

Articles

Travel Year Facts and Superlatives (Dec 2019)
Why I Started Eppo (Feb 2021)
Reducing Experiment Durations (June 2021)
The Designer Gap in Experimentation Tools (June 2021)
We're Hiring A Statistics Engineer! (Aug 2021)
Should You Always Run An Experiment? (Aug 2021)
Stop Micromanaging Product Strategy (Sep 2021)
The most impactful thing a data team can do is establish an experimentation program (Dec 2021)
Bending Time in Experimentation (June 2022)
We Raised $19.5M! (June 2022)
Experimentation for the Modern Growth Stack: Our Investment in Eppo (June 2022)

People

Book

The Mom's Test (by Rob Fitzpatrick)

Notes

My conversation with Chetan was recorded back in August 2022. Since then, Eppo has launched feature flagging, and now offers the first "flags on top of your warehouse" experimentation platform. They also have Miro, Twitch, DraftKings, and Zapier as customers.

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.

Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

Episode Transcription

Key Takeaways

Here are the highlights from my conversation with Chetan:

On His College Experience at Stanford

I grew up on the east coast in Connecticut and then went out west to attend Stanford for my undergraduate degree in electrical engineering.

Like many undergraduates, I had a general interest in the field, but my focus became more refined over time. I knew I wanted to build things and found electronics and coding fascinating, so I knew I wanted to go in that direction. However, I became increasingly drawn to the signal-processing side of electrical engineering, which deals with transmitting and understanding information such as images, audio, and sound.

As I delved into these topics, I quickly realized that the more interesting problem was identifying what the information actually represented. Is this an image of a cat or a TV? That realization led me to the AI and machine learning track, where I took a lot of classes and discovered that so many of these algorithms were based on statistics.

I enjoyed the statistics track, which taught me how to understand the world probabilistically and reject null hypotheses. This knowledge proved very relevant to the data science industry that emerged shortly after that and to my current work in experimentation.

My favorite classes included most of the signal processing track. One class taught me how to encode information into a signal, then transmit and receive it using assembly code. I found it fascinating to understand how our cell phones and other devices transmit information. I also enjoyed a class that covered the interface between hardware and software, which showed how circuits exist in an analog sense and are transformed into something binary and logical.

In the statistics world, I really enjoyed the Applied Statistics Ph.D. track taught by well-known figures such as Trevor Hastie and Rob Tishirani. At the time, the machine learning world was obsessed with support vector machines and other methods, but this class covered the methods I used extensively in my data science career. These included principle component analysis, decomposition, regularization on linear regression, and Lasso.

On Gaining Industry and Research Experience

I had an enjoyable experience working for IBM in Japan, specifically in Tokyo. Living abroad and working there was a great experience. I think this is true for many people in data and statistics: one of the great things about it is that you can explore a wide range of problem domains. By analyzing data and observing user behavior, you can deeply understand the subject matter.

During my college career and early work experience, I tried various fields to see what data work was like in each one. For instance, I interned at Quantcast but quickly realized that I didn't want to do ad targeting for various reasons.

I also found Stanford's Minds, Computer, and Brain lab fascinating. It was a neuroscience version of statistics that explored a different way of studying the brain. Instead of just measuring blood flow to certain areas, they explored the idea that changes in the rhythm of neurons firing could be an indicator of brain function. This required statistical tests that were more focused on signal analysis than traditional quantitative measures like mean and variance.

Overall, I found it interesting to learn about a wide range of fields, including neuroscience, that I had never worked in before.

On His First Analyst Role at Acumen

After switching from the machine learning track to statistics, I found that learning about various metrics and statistical methods made many deep problems more interesting.

In my case, the problem was healthcare policy - specifically, how Medicare handles care, payments, and enrollment fraud. The product I worked on was exciting because it was developed soon after the Affordable Care Act (Obamacare) was introduced. While many features of the Act were well-known, there were still many questions about pilot programs focused on pay-for-performance initiatives.

The basic idea was to pay better doctors more, but there were many unresolved questions. For example, what exactly makes a doctor "better"? How much more should they be paid? How do you account for confounding factors, like doctors who treat sicker patients?

Luckily, Acumen had access to the complete universe of Medicare claims - every claim that had ever been made. This allowed for rich analysis, and the end result of our recommendations was actually implemented in Medicare. It's now part of national healthcare policy, which is really exciting and impactful work.

In the end, however, policy work can be very slow. I worked on some projects in 2010 that didn't go into effect until 2014. As an early-career professional, I needed a faster feedback cycle to better understand what was and wasn't working, both technologically and organizationally. So I eventually moved on to Airbnb.

On Joining Airbnb As Their 4th Data Scientist

Joining Airbnb was a special experience. In 2012, it wasn't yet a name brand, and people thought it was strange to stay in strangers' homes or to have strangers stay in their own homes. But the company itself had a special culture. The office was vibrant, with a cohesive value system and an exciting atmosphere. I remember meeting a friend who worked there at a bar in SF, and he arrived in a big Beatles-themed school bus called The Magical Mystery Bus. The whole Airbnb team rolled out of the bus into the bar, surprising me with their fun and unexpected behavior.

The company was empowering, allowing people early in their careers to do really cool things. The first project I worked on was building model training and hosting environments for Airbnb's fraud models. At the time, hackers were attacking Airbnb, and we had to develop ways of blocking them and detecting fraudulent activity. We built everything from scratch, including a Java service that did online feature calculation and a system that hosted PMML files, an open standard for machine learning files. This infrastructure allowed us to build models for chargebacks, account takeovers, fake listings, and more.

While I found machine learning fascinating, I realized that my real interest lay in decision quality and insights generation. Working on these aspects of experimentation was more impactful and interesting to me, so I went in that direction instead of pursuing a career in machine learning at Airbnb.

On The Early State of Data Science at Airbnb

The data at Airbnb was quite interesting. They hired a data leader as one of their first 10 employees and worked with him for a long time, eventually bringing on a few more people once the company began scaling. Initially, the early data hire focused on basic business reporting, such as the number of weekly bookings in Germany. Once it became clear that there were areas of the business that needed data support, they scaled from there.

The first area of focus for Airbnb was a land-and-grab growth phase, where the company needed to expand internationally and establish a solid presence as the leading travel booking agency in Germany, Thailand, and beyond. The data worker performed a lot of supply-demand liquidity analysis to determine where to allocate resources to get more listings or increase demand. Airbnb then started experiencing fraud, which required the work of data scientists.

I was brought in to help with that, and we also had a search ranking model that needed a data worker's help. We tackled these challenges one by one, investing in each area as needed. At the time, Airbnb lacked infrastructure, with just a bunch of production MySQL tables from which we could pull queries. We had to set up a distributed compute analytics environment, eventually using Pig and Hive for distributed computing before Redshift, Snowflake, or BigQuery existed.

We initially used a basic schedule or "Cron tab" to orchestrate the jobs, which proved too brittle for data ops. We then developed Airflow, a tool that became a seminal moment in Airbnb's data journey. Many other tools and experiments followed, but it was pretty bare early on.

On Building The Knowledge Repo

The problem we faced when we started doing data science work was that we were a small group of four or five people, and it was easy to disseminate knowledge among us. However, as the company grew and we had 50 data scientists and a thousand headcount, it became challenging to reproduce historical work because the information was scattered across Google Drive or chat conversations. This led to wasted effort and stifled strategic analysis investment because only a limited audience could benefit from it.

We built a lightweight system to solve this issue to make analytics work trustworthy and communicable across the organization and time. We used Jupyter notebooks and R markdowns with linked visualizations, added packages to connect to our data warehouse, and built post-processing capabilities to upload the notebooks on GitHub for peer review. We also created a web app with social collaboration features and a blog reader to keep up with the latest findings. This system has become widespread in the industry and is essential for making informed investments based on underlying trends.

On The Impact Of An Experimentation Program

I believe you are referring to a blog post I wrote in which I stated that experimentation is the most impactful thing a data team can do. Based on my observations of many data teams, the modern data stack makes it easy to set up basic reporting and solve simple problems like tracking weekly purchases over the past year.

However, the reason for investing in data goes beyond having a data warehouse or dashboards. It is about making better decisions by understanding the impact of past decisions and informing future ones. To achieve this, data work needs to be closely tied to the point of decision-making. Experimentation is the key to deploying data into decision-making, as it moves beyond just providing interesting charts to becoming a framework that supports core consequential decisions.

Implementing experimentation programs leads to metrics becoming a central stakeholder in decision-making and people proactively reaching out to the data team to improve metrics. This increased intimacy with metrics leads to investments in the data foundation layer, better instrumentation, and curation of data artifacts. Overall, once data is no longer ignored but put through a framework for making core consequential decisions, everything starts flowing much more smoothly.

On The Evolution Of Airbnb's Experimentation Platform

Regarding the evolution of experimentation at Airbnb, it's interesting to note that unlike Mark Zuckerberg or Jeff Bezos, the founders of Airbnb were designers. They weren't naturally inclined to make quantitative decisions, so experimentation at Airbnb didn't initially come from a top-down approach. Instead, it started with the search ranking team, which needed to run experiments to determine which model performed better.

Running experiments was also necessary to justify resourcing time for search-ranking engineers and to prove that the team could drive bookings and other business metrics. While all analysis was initially done manually by data scientists, they eventually developed a self-serve dashboard specifically for search ranking experiments.

This is a familiar evolution for many companies, but what really accelerated experimentation at Airbnb was when a former Booking.com executive visited and suggested investing in an experimentation platform. This inspired Airbnb to invest in experimentation more heavily, and the tipping point occurred when one marketplace team had a very successful year due to the specific product decisions they made through experimentation.

This success led to the team absorbing other teams and experimentation becoming a central part of the company's culture and processes.

Experimentation platforms share a common architecture that makes sense. There are two basic inputs to the system. The first is the ability to randomize users into groups in an idempotent way so that if you return after a few days, you'll end up in the same group. Engineers use a bunch of SDKs and clients to randomize people into groups.

The other big input is all the metrics. A company uses a lot of metrics, starting with search ranking. They probably look at the number of searches, search conversions, the number of listings viewed, bookings made, etc. On the other hand, the payments team tracks how many payments were fulfilled, the chargeback rate, associated fees, etc. Every team has its own pool of metrics. The second piece of this architecture is to centralize all these metrics into one source of truth so that whoever runs the experiment uses the same metrics.

From there, a bunch of data pipelines are created to handle the complexity of experimentation. The overall results and diagnostics are calculated, and basic investigation capabilities are allowed. Finally, a web app serves all this information in a self-serve, interactive, and shareable format.

On Taking A Work Sabbatical

It was a wonderful year. After nearly five years at Airbnb, you can expect to build up quite the travel bug. It was great to start going through and visiting all the places on my list. One of the things I loved about that travel year was that my wife and I had a lot of family and friends living in different countries. Traveling for a long time can get lonely if you don't have that social support. So it was really nice to see them and experience different cultures.

There were lots of different memories, like biking down Taiwan's East Coast, where they have great bike infrastructure. The food is amazing, and the scenery is beautiful; it reminded me of California. We also loved our time in Ethiopia, an incredibly underrated destination. From the distinct food to the coffee, which was first produced in Ethiopia's Kaffa region, it was all amazing. The music, history, and nature, including the mountains and the rift, were also incredible. Senegal was another amazing destination, but that's a story for another time.

I noticed that I was much more in shape because I was so active all the time. But the best thing was that it was like an intellectual reset. When you're traveling, and you're in a different hemisphere than your communities, you start to realize how much those communities are setting your intellectual agenda. Living in San Francisco, I was surrounded by people talking about how self-driving cars would change everything and whatever topic was popular then. It was always the same conversation and ideas.

When social networks go dark because everyone is asleep and you're in a different country, you start thinking about what you want to read and learn about. I found myself getting into industrial policy and economic development questions. I was in countries like Singapore and Korea, which developed so quickly into advanced societies when they were once peers of others. My wife is Sri Lankan, and they always say that Singapore and Sri Lanka were once in a very similar development state, but they've gone in very different directions. I wanted to learn more about the people involved, the policies, and how to lead to considerable economic growth.

On Transitioning Back To The Work Life

After a year of travel, my wife and I moved to Atlanta, realizing we had no jobs, no apartments, and nothing holding us back from relocating. We considered a few different cities, but Atlanta was the most intriguing to us. We were drawn to its unique qualities, which felt like a departure from what we had done before.

At the time, I didn't know what I wanted to do after leaving Airbnb. I wanted to take some time to explore, especially after a year of traveling. I began looking into the world of government technology, which had become more involved with the tech sector during the Obama era. I found it interesting that the data and technologist movement was also beginning to affect local governments, which I was more passionate about than the national government. However, I soon discovered that the impact of local government technology work largely depends on the executive leader, such as the mayor or the president, and I realized that I would need to find the right circumstances to make a difference.

I tried out a few different companies, but it was difficult to find a good fit. At the time, it was nearly impossible to get hired if you didn't live in the city, so I ended up starting my own company. I co-founded Saltbox, an industrial real estate company, which was a cool exercise in fundraising and marketing. Though it was my first time doing these things, I found the e-commerce entrepreneurial world very exciting and felt that it was a business that anyone could own with great business skills.

Eventually, I withdrew from Saltbox because it was more of a real estate play than a technology play, but the journey was still very interesting. It made me realize that starting a company was something I wanted to do again, and I began to look into the data tool space.

On Working As A Data Scientist at Webflow

What I loved about working at Webflow was the opportunity to work closely with the CTO or CEO, who was close to strategy and the founding story. I knew that I wanted to pursue something entrepreneurial down the line, and Webflow was an opportunity to do that. I reported to the CTO, Bryant Chou, who was an awesome guy. We worked together on Eppo.

Working at Webflow was great because it allowed me to see what data looks like in 2020 and how the landscape has changed since Airbnb. I had done some consulting and contracting work before, but this was a full-time job, so I could see it over a much longer timescale. It was during this time that a lot of the underlying principles behind Eppo really started to crystallize.

One of those principles was the realization that even in all of these companies I met in Atlanta or all across the world, everyone now has a cloud data warehouse and tools like dbt or Airflow. You can just assume everyone has these infinitely elastic publicly addressable databases that are the center of the universe for everything analytics. And there's now an ecosystem of tools to get data in and out and work with it. This was when a lot of the reverse ETL tools were just coming up. Stitch and Fivetran were around, but Fivetran was new. All these problems that were such a pain at Airbnb were now solved.

It was interesting to think, okay, if that's all completely solved, what's next? And when I went through it, I realized that even having the centralization of data and the ability to mold it in your hands, how you drive change with it was still pretty elusive. That was what my contracting query was about, saying how can you actually drive outcomes with data?

At Webflow, like most venture-backed companies with a product-market fit, the name of the game was growth. How do you grow? I was supporting marketing and product growth teams, and there were many things we could buy to help here. But the thing that was a huge obvious omission was experimentation. At that point, I didn't want to rebuild the experimentation infrastructure again, so I went down the commercial market to see if I could buy the thing. I talked to Optimizely and others, but these tools looked nothing like what I built at Airbnb or what existed in these companies. So I ended up building it again and building culture and everything.

It stuck in my mind how every time you install an experimentation infrastructure, the conversations change. The way you think about data fundamentally changes. That was really interesting for me, and it eventually led to the creation of Eppo.

On Starting Eppo

I have always been drawn to experimentation because of the high ROI data it can provide. What struck me was that while everyone wanted better tools for experimentation, very few had the internal capacity to do so.

Companies like Airbnb and Netflix are not normal companies. They can employ a large staff of PhDs and veteran engineers. They probably have an overabundance of software engineers compared to what they need to do. On the other hand, companies like Webflow have only a handful of data scientists and few people who have ever run an experiment before.

At Webflow, I realized that even if you build the Airbnb system, it's not sufficient without the staff that Airbnb has. I wanted to build a system that could deliver Airbnb experimentation results to growth-stage companies that may not have this level of technical staff. Can we build something that doesn't require so much expertise and education but still conveys the statistical results in plain language for clear decision-making?

That was the vision behind the experimentation system I wanted to build. I shopped around a few other ideas, but this one stuck with me. I wanted to be able to sell it five times before writing a line of code. Experimentation is such a hot topic that every growth team and every team above Series B realizes they need to do it. However, there's no way to purchase the means infrastructure for it, and that's what we're bridging.

Fortunately, I had a wide network of data leaders at that point, so I could pitch my idea to many people. The early VC ecosystem was also helpful in validating my ideas. Once I started seeing that I could convince people to sign a piece of paper saying they would purchase the system for a certain amount of money and I started getting word-of-mouth recommendations, that was enough signal for me to know that there was something here.

On The Key Capabilities of Eppo

One easy way to think about it is that we wanted to build an experimentation platform that was native to the modern data stack and made a fair level of experimentation available to any sort of company.

Concretely, this means we built a metrics-first experimentation platform, unlike other commercial tools that focus on traffic splitting and randomization for experiment setup but often neglect metrics, analytics, and understanding who outperformed whom.

It struck us how companies would purchase Optimizely for $500k or more per year and then not even use the analytics at all, not even trusting it and having their data team do it instead. So we decided to turn that on its head and say that we think the analysis and metrics side of experimentation is actually the most difficult part, and that randomized users are a pretty easy problem.

Therefore, we built our platform to separate the two sides. We have a randomization SDK so you can randomize people into groups if you don't have one already, but if you have one in-house, feel free to use it. We also built a way to create a centralized metrics library and serve that into visually engaging, informative, and highly interactive experiment reports. This way, people can dig deeper, understand root causes, and do some diagnostics.

There are many more pieces to this, but we would say that it's experimentation as a data team would want it, with a lot of control over metrics, and built to serve all the different types of experiments, from product to machine learning to marketing, and more.

On CUPED

CUPED is a big deal. For context, CUPED is a statistical technique that is now completely mainstream. If you work at a place like Airbnb or Netflix, 100% of experiments use CUPED. If you have run a CUPED experiment at Airbnb, you have used CUPED, but it was not available on the commercial market until we brought it out of the woodwork.

This means concretely that an experiment that would take, let's say, two months might only take a month and a half. This can save weeks of product time, which can be really impactful. Think of how many more ideas you can test and how much quicker you can learn when you have that extra time.

The way it works is like noise-canceling headphones. Just as noise-canceling headphones read in ambient sound and subtract it from the signal to make it seem like there is no sound, CUPED reads in ambient signal from all the data before the experiment and subtracts it from the experiment data. This leaves only the treatment versus control effect.

CUPED is a really powerful feature for Eppo because it shows that if you decide to do your own analysis in a Tableau dashboard, you literally cost your product team weeks of time. Your experiments will go slower. But in general, this touches on a broader conversation: there is so much impact available in modern statistical methods that people just aren't aware of because it's pretty niche expertise. These companies often only use high school-level statistical methods like t-tests and z-tests as the workhorse.

The realm of research and statistics has led to some powerful stuff. Today, we use CUPED and sequential analysis methods, which have their own advantages to be more robust to an organization. We have much more stuff coming down the pipeline in terms of faster, more accurate experiments purely through math.

Though I admittedly did more literature review on the statistical field before I started a company and had kids, we have amazing statistical advisors and a staff of statistics engineers who are statisticians who know how to code. So we have many people who are keeping up to date with stuff. Plus, I will always be interested in this field from my days in grad school, so I like to keep up to date as well.

On Hiring Statistics Engineers

When data science was first adopted, it was similar to Facebook and LinkedIn, and then Airbnb and others joined in. Initially, data scientists were these unicorns who were both amazing software engineers and knew stats and data. They could contribute in the same way as engineers but were also familiar with the language of data. However, as the data ecosystem evolved, this changed. Now, even at Airbnb, and especially afterward, you bring on a lot of computational social scientists who are very good at doing analysis but would not contribute to production code. They lack the skillsets and familiarity with the workflows or design patterns of how to read a codebase.

In the modern data stack with dbt and Snowflake, most data teams consist of analytics engineers who work purely in SQL. We want to hire people who are in the old mold of data scientists who can commit to a production codebase, understand it and all the design patterns within it, and have a strong understanding of statistics to implement solutions. It is difficult to find these people, but as a company, we have a special draw for those with that background.

The difference between an analytics engineer and a data scientist is that the latter has more of a statistical toolkit. On the other hand, a statistics engineer can commit production-grade code; they are like engineers in the truest sense, on-call and responsible for delivering via an API instead of a Jupyter notebook. It is hard to find these people, but we are fortunate that if you are that type of person, we are a special place for you.

On The Modern Data Stack and The Modern Growth Stack

The modern data stack consists of a specific set of tools and beliefs. The tools are an elastic, centralized environment that brings all data from every player into one place. This enables you to work with it and eventually serve it out. The big three tools are Snowflake, Redshift, and BigQuery. Databricks now has a SQL interface, but we'll see if that gains as much market share. Orchestration tools like dbt and Airflow are used to deal with the data.

The modern data stack's underlying belief is that you need a centralized place to determine what revenue, purchase, and user mean. Once you have clean, curated versions of these things, you can serve them everywhere else. Eppo fits into this by reinforcing that set of beliefs. Eppo helps you avoid every other tool that basically wants to create its own data warehouse by having you send out data to them and then define what revenue and purchase mean. This makes it difficult for data teams to reconcile the numbers, which is a waste of their time.

Regarding the modern growth stack, data managers and product scientists are more important than data scientists and product managers. The data team focuses on standing up the modern data stack and the pure canonical versions of things. It is up to the product teams to engage with science by testing hypotheses. Reforge and other institutions like it have a whole wing of product leadership focused on science. However, these growth teams lack the necessary tools to do their job. Eppo fills this gap by providing an experimentation framework that helps you run experiments.

Growth teams and machine learning teams are usually the early adopters of Eppo. However, once people want to run experiments, the whole company tends to use it. The target persona for Eppo is the product-minded data leader and the data-minded product leader. Eppo enables collaboration between data and product teams to run experiments and improve growth.

On The Modern Experimentation Stack

Part of my goal was to be helpful and explain what it takes to build an experimentation infrastructure. Whether you're doing it manually or using a platform, there are certain pieces you must have to ensure success.

For example, you must randomize users, calculate metrics, and check the diagnostics of the test. However, I wanted to convey that building an experimentation platform is not just one deep technical problem. It's a large number of medium-sized problems that all need to work together.

To give you an idea, in an experiment, you need an in-app randomization SDK that engineers must use properly. Improperly setting up an experiment will lead to inaccurate results. You also need well-defined metrics that are centralized, such as revenue and purchase. These metrics must not have any holes, like missing data from specific devices or platforms. Data pipelines for experiments tend to be complex, which can account for a significant portion of an organization's computation costs.

Statistical tests, such as a T-test, require expertise to interpret the results correctly. Reading reports and explaining concepts like p-values to others can also be challenging.

To make experimentation successful, it's essential to cover the whole company and drive a large velocity of experiments. Platformizing experimentation is necessary to ensure that all the pieces work together in an orchestrated manner. Once you realize how many things you have to platformize and coordinate, it becomes a different type of problem.

On The Experiment Overhead

Sometimes companies, especially those with alumni from companies like Facebook or Airbnb, believe they need to run everything through an experiment. However, this approach is only effective if the company has good infrastructure in place. If the experiment involves four or five people working for a week or two and waiting six months for results, the return on investment will be low.

The success rate of product launches is only 20-30%, so to truly understand what's working and what's not, companies need to run more experiments--not just four a year, but closer to 50 or more. This requires scalable infrastructure investment. So, if a company wants to prioritize experimentation, it should invest in infrastructure at the outset.

On The Designer Gap In Experimentation Tools

I believe that experimentation is a problem that's uniquely leveraged in design because product teams are essentially taking a trust fall. They are putting their careers through this value system of something that feels like a black box and involves a bunch of stats and numbers. Even if they don't fully understand it, it will matter for promotions and other career opportunities.

It's a highly consequential and complicated thing. Having seen hundreds of experiment reports at different companies, I've noticed that most of them are unreadable. Unless you are a statistician or one of the people running the experiment, there's no way you'd read the report and understand whether it was good or bad, or whether you should launch it or not.

One of the things I was passionate about with Eppo was making it easier to understand experiment reports. At a place like Webflow, they needed much more guidance on how to make decisions based on the data. What is a p-value? What did I learn from the experiment? To accomplish these things, you need a design mindset that prioritizes making the report easy to read for anyone.

To do that, you need ideas around visual hierarchy. What should be the star of the show on the interface? What context should someone come in with? What do they already know? The problem is that most experimentation tools are built in-house and don't receive design resources, so they don't even confront these problems.

I call it the designer gap in experiment tooling, an area where we've invested heavily.

On Metric Strategy

I believe that the intersection of technological and organizational challenges in driving growth is quite complex.

According to the metric strategy article, around the time of Series B or Series C, companies experience a shift away from product development centered around the founders. Until this point, most product OKRs have been focused on building specific features (e.g., "Build X, Y, Z"). However, once a company reaches a growth phase and finds product-market fit, shifting towards a more metric-driven approach is important. Instead of dictating what to build, the focus should be on driving key metrics like weekly active users.

To do this, the product strategy needs to be pushed down to the teams, and the question should be asked: "What do you think will drive weekly active users?" This shift from a "shipping strategy" to a "metric strategy" is essential for companies experiencing hyper-growth. With a top-down approach, a company will eventually hit limits. However, companies can continue to grow and thrive by embracing a more data-driven and scientific approach, building out the data team, and running experiments.

This is the central theme of the series on metric strategy.

On Hiring

As a hiring manager, there are many basic lessons you learn. One of these is that you benefit greatly when you can clearly articulate the role. What will the person in this role do? Who will they work with? How will they be measured? Crystallizing these details helps you both source and sell, allowing you to paint a picture of what life at the company will look like. It also helps you avoid mistakes by allowing you to perform diligence on why you are bringing this person in and what they will do. Having a clear idea of why you are hiring people and what their role is for is invaluable.

Another important lesson is that great people attract other great people. As a founder, it can be challenging to hire early on, but taking the time to find a high-quality hire who has both an incredible skillset and fits well with the company culture will save you time in the long run.

There are several cultural principles that I care about and have instituted at Eppo. I believe in a culture of entrepreneurialism that values finding areas to drive impact and being comfortable with end-to-end ownership. In addition, great communication is important to me, as many productivity and interpersonal issues stem from communication failures. Starting with people who have strong communication skills can make all the difference.

Even if you struggle with public speaking, there are other ways to improve your communication skills, such as writing or preparing your remarks. At Eppo, we place a high value on communication skills.

On Finding Design Partners

If you're considering starting a company, it's important to choose an idea where it's not too hard to find early design partners. You can improve your execution and hiring skills, but if people don't want what you're selling, it's going to be tough. That's why I shopped around for three or four ideas and chose the one with easy-to-find design partners. Basic sales skills are important, but if it's challenging to sell your product at the outset, it's not going to get any easier. Consider a different idea if that's the case.

It's important to be responsive to your early design partners and act quickly on their feedback. Their feedback is more important than their money. Show them how responsive you are to their feedback to encourage them to give more.

Before you even start the company, do market validation and have a clear picture of your ideal customer. Some founders take on partners who are out of bounds for what they built the company for, which can be a big distraction. In our case, we knew we wanted to serve growth-stage consumer and PLG companies selling to their data teams. It's important to have a clear picture of your ideal customer and not get distracted by others.

On Fundraising

When it comes to raising money, the focus should actually be on the fundamentals of the business. Proper market validation is crucial. If you can convince yourself that there's real market demand here, and you have proven it through design partners and other means, then the investment will take care of itself.

The key is to demonstrate appropriate milestones for the amount of money you want to raise. Early on, you need to show that there's a real market there, even without code. You can do this in a variety of ways, but if you demonstrate true market demand, you can build from there and ship out a few times.

That will get you started in terms of fundraising, and once you get to Series A and B, there are different appropriate milestones in terms of growth and revenue or other metrics. But fundamentally, if you have a strong business, you will get what you want out of the fundraising process. So it's best to focus on the fundamentals of the business.

Suppose you have great fundamentals for your business and are now deciding between investors, which is a very fortunate position to be in. Every founder has their own lens on this stuff. I focus very much on the partner rather than the firm. There are all sorts of firms with different levels of brands and everything, but the partner is the one you'll talk to every week. Who will go to bat for you at the next fundraising round? Who will be forever attached to you? Just make sure that partner is truly a partner of the business. You should be excited to have them along for the ride.