Datacast

Episode 78: Open-Source Investing and Data Product Management with Julia Schottenstein

Episode Summary

Julia Schottenstein is a Product Manager at dbt Labs, the maker of the popular open-source project dbt. Prior to joining dbt Labs, Julia was an investor at New Enterprise Associates, where she spent her time investing in infrastructure, developer tools, open-source, and data startups. She currently sits on the boards of Sentry and Metabase while being an active angel investor. Julia graduated from Stanford University with degrees in Computer Science and Management Science & Engineering.

Episode Notes

Timestamps

(01:40) Julia shared the differences growing up in New York and moving to San Francisco.
(03:05) Julia discussed her overall undergraduate experience at Stanford — getting dual degrees in Computer Science and Management Science & Engineering_._
(05:40) Julia went over her time as an Investment Banker at Qatalyst Partners — notably working on Microsoft’s acquisition of LinkedIn.
(09:11) Julia talked about her career transition to venture capital — working as an associate investor at New Enterprise Associates.
(10:46) Julia emphasized the importance of getting up-to-speed and forming an investment thesis as a new investor.
(15:05) Julia discussed her Series A investment in Metabase, an open-source business intelligence software project.
(18:36) Julia unpacked her investment(s) in Sentry, an application monitoring platform that helps developers monitor apps in real-time to catch bugs early.
(20:14) Julia explained her investment in the Series B round for Anyscale, an end-to-end computing platform that makes building and managing a scaled application across clouds as easy as developing an app on a single computer.
(23:03) Julia contextualized her investments in the seed round for Datafold, a data observability platform that equips analytics engineers with the tools to address data quality issues.
(24:24) Julia shared typical hiring and go-to-market decisions that companies need to make (depending upon their growth stages and product strategies).
(27:05) Julia mentioned her Metabase application to help investors pick winning open-source startups.
(29:05) Julia rationalized her switch to becoming a product manager at dbt Labs.
(30:34) Julia peeked into the roadmap of dbt Cloud, a hosted service that helps data analysts and engineers productionize dbt deployments.
(33:34) Julia went over an under-invested area and the role of interoperability within the broader data tooling ecosystem.
(37:56) Julia reflected on the difference between being a venture investor and a product manager.
(41:05) Closing segment.

Julia’s Contact Info

dbt’s Resources

Mentioned Content

People

Tristan Handy (Founder and CEO of dbt Labs)
Ali Ghodsi (Co-Creator of Apache Spark, Co-Founder and CEO of Databricks)
Dan Levine (General Partner at Accel Partners)

Book

“Working Backwards: Insights, Stories, and Secrets from Inside Amazon” (by Bill Carr and Colin Bryar)

Notes

My conversation with Julia was recorded back in May 2021. Since the podcast was recorded, a lot has happened at dbt Labs! I’d recommend:

Reading Julia’s recent blog posts on adopting CI/CD and introducing Environment Variables in dbt Cloud.
Watching the talk replays from Coalesce, dbt’s 2nd annual analytics engineering conference
Listening to Season 1 of the Analytics Engineering Podcast, where Julia co-hosts with Tristan Handy to go deep into the hopes, dreams, motivations, and failures of leading data and analytics practitioners.

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.

Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

Episode Transcription

Key Takeaways

Here are highlights from my conversation with Julia:

On New York vs. San Francisco

I am a city kid who grew up in Manhattan and moved out to the West Coast to go to Stanford. My parents had always joked that I’d never come back when I moved out West. Over a decade later, I guess that was true.

I think both New York and San Francisco are different in some ways while similar in others. They are both professionally oriented. There’s a high density of interesting people in both cities. But of course, the sphere of power in New York is finance, and the sphere of power in San Francisco is technology. I enjoy being out in SF both because I’m interested in tech and love the outdoors (which the Bay Area is great for).

On Studying at Stanford

I have always been interested in complex and challenging problems, so I was drawn to engineering. My interest in technology blossomed when I got to Stanford. I found that people in tech are optimists, and they like to paint a future of what the world could be. Being around them excites me.

I started studying Management Science and Engineering when I first got to Stanford. It’s a multidisciplinary major within the school of engineering. I got a taste of finance, statistics, operations research, and many different engineering disciplines. I was drawn to that major in particular because growing up in Manhattan, the finance angle really intrigued me.

Then I started taking Computer Science classes, which were new to me when I arrived at Stanford. They stretched me intellectually and enticed me to do a double major later on during my junior year.

I love Stanford. Stanford is such a unique place. The energy on campus is special because we have a confluence of entrepreneurs and, generally, people solving hard problems. Exposure to many exciting people and companies as a student excited me.

I was fortunate to take a smattering of courses across finance and engineering. I was also involved with Stanford Finance, where I was the President, which helps train people for finance professions.

On Finance vs. Engineering

I believe there’s more creativity in software development than in finance. Finance teaches you about ensuring that the details are right. You have to think critically about making sense of numbers, so you engage the quantitative side of your brain.

In engineering and software development, you start with different questions and problem facets that you need to tackle. Given this end goal, how do you build something to get there? Whereas in finance, given the information, how do you use financial tools and techniques to come away with great insights?

On Being an Investment Banker at Qatalyst Partners

When I joined Qatalyst, it was almost a startup investment, which is a funny way to describe a financial institution. But they were just getting started. One thing they did really well was the sell-side advisory for tech companies. They work with some of the most exciting tech companies globally during the critical moment they think about selling themselves.

Certainly, the capstone deal for me was the LinkedIn-Microsoft transaction, in which Qatalyst helped advise LinkedIn in selling itself to Microsoft for $26 billion. That was an exciting deal with tons of twists and turns all the way up to the last minute. That, for sure, was my proudest moment because friends and family who weren’t in the tech world could appreciate what LinkedIn is (because of its audience and global nature).

The other thing I love about Qatalyst is my colleagues. I made some of my closest friends there because my life and work blended as I worked around the clock in the office. So I was fortunate that I got to spend many hours with interesting, bright people — who are now some of my closest friends.

Qatalyst had a high bar for what the quality of your work needs to be. They had practices in place and habits created so that your work is correct (because it does need to be). You can’t just be 90% there. It has to be 100% there when you’re dealing with transactions of such big magnitude. In the day-to-day, you spend a lot of time in Excel modeling, building different valuation techniques for companies, doing compensation and DCF analyses, etc. Qatalyst was mostly for live transactions, so anything could pop up: new term sheets, new acquirers, lack of interest in a deal, etc. Much interesting strategic chess that happens in making a deal comes from the beginning to fruition. It was an exciting and busy period of time for me.

On Transitioning to Venture Capital

At Qatalyst, we were involved at the transaction level. We came in at a critical moment in a company's lifecycle (when it was getting acquired) and then moved on to the next deal. It was really transaction-oriented for me.

I had a lot of personal passion for technology, product, and companies, so it made a lot of sense to go into an investing role — where I have to think more of these kinds of problems rather than just the financial problems that I dealt with as an investment banker.

New Enterprise Associates (NEA) was an excellent spot for me because they are a one-fund model, meaning that you do early and growth investing out of the same fund. When I was starting out, I had the opportunity to see both really early-stage investments and some growth investments. That’s why NEA was a great choice for me.

On Proving Value As a New VC

Starting out as an associate at a venture firm comes with a lot of imposter syndrome. How can I be helpful when I haven’t done many of these things before? The best advice I can give is to focus on the industries you want to invest in.

For me, they were infrastructure, data, and open-source. It’s important for me to become an expert in these spaces. The only thing I had over other people who had been in VC longer was the time advantage, right? I was spending the time to get up to speed and talk to many open-source entrepreneurs. They can immediately tell the difference between an investor who really understands open-source investing versus someone who’s never spending any time in that space.

In venture, you don’t get any prizes for being second place, so you have to spend your time wisely and invest in a few companies you want to build close relationships with. You want to double and triple down on being helpful to the companies you have conviction in. Otherwise, you’ll spread yourself way too thin, and it won’t be a successful strategy.

I think VCs do a great job of broadcasting their thoughts a lot. It seems that they can be glamorous at times and are on the cutting edge of what’s next. One thing people don’t realize about venture is that it’s a very individual sport. Often, you don’t have a sense of near-term goals. You constantly think about your accomplishments on a weekly or monthly basis. Your time scales are very different because, even if you are able to invest, that’s not success. Success is investing in a company that then turns into a big outcome. So the time scale for venture can be radically different.

On Investing In Metabase

As an open-source investor, I found the Metabase community to be taken off back in 2019. While doing customer calls, I found Metabase users really excited about the project, which is rare for a BI system. Then, I spent a lot of effort getting in front of Samir (Metabase’s CEO). At first, he gave me a stiff arm and said he didn’t want to talk (because NEA had invested in Tableau). Ultimately, I got him to take a meeting. It only took a week turnaround from that meeting to the investment because I had done the homework up front and had the conviction that this was an interesting new player in a big category. The sheer joy and excitement of their customers led to my enthusiasm for this investment.

In the beginning, I was primarily interested in stars and contributors for Metabase. But the metrics that I was more excited about are those not widely available, which I learned when I actually spent time with the company. For Metabase, those are the growth of active users and time spent on the platform. They have crazy engagement in daily, weekly, and monthly active users. People use Metabase pretty much daily. For me, that was the biggest tell that Metabase is a critical system in their workflow.

On Investing In Sentry

I love Sentry’s founding story because David and Chris built a platform that was what they needed as developers. It’s a known fact that when you ship code, it will inevitably crash, and you need the insights to figure out what went wrong. And many open-source companies don’t monetize from Day 1. Sentry did. They let developers put credit cards down, and they were profitable before taking NEA’s investment. I think that’s a great testament to the fact that they understand their customers well.

Sentry’s North Star has remained clear from Day 1. They want to be the best application performance monitoring platform built for developers, not built for applications. I think they really deliver on that promise. It’s just a lot of fun working with this team. They know the space, and they are whipped-smart. We have a lot of fun at board meetings too.

On Investing In Anyscale

Anyscale was founded by an entrepreneur well-known to NEA named Ion Stoica. My partner, Pete, backed Ion’s previous company Conviva. Ion went to co-found Databricks and became the chairman of Anyscale (which is behind the open-source project Ray).

Anyscale is at the cutting-edge of ML. They make it possible to do things that weren’t possible before by massively parallelizing computers.

If you want to optimize your ML models, it’s hard to figure out how to tune them to be the most performant. Anyscale helps you come up with the optimal solutions.
Reinforcement learning is also a computationally complex problem and uses tons of resources. Ray helps you solve that problem easily by intuiting which tasks need to be solved sequentially and which tasks can be done in parallel.
Python developers also find it dead simple to scale out their large ML models with speed and ease.

In terms of open-source adoption, they have been able to strike a chord and resonate with a hard problem for people.

On Investing In Datafold

Datafold reminded me a lot of Sentry. While Sentry solves for developers the problem of crashed code, Datafold tackles a similar problem for analysts. You inevitably will introduce mistakes into your data models, so you want to spot them as soon as possible.

Datafold started with the Data Diff product, which lets you understand the shape of your data. If you have logic in your models that aren’t quite right, or you’ve introduced an error gate of old models, Data Diff makes it obvious to the data analyst.

I think data quality is a big category. Datafold has done a great job of moving the industry forward to make sure that you’re planning for data quality at the time of building the data models, not an afterthought.

On Giving Advice For Her Portfolio Companies

Hiring strategy depends on the company stage.

At the earliest stages, it’s hard to hire talent because you need to convince a potential employee that your risk is worth taking. The best advice I can give is that if you find talent early on, whether specifically for a role or more of a generalist, try to bring them in. The next hire becomes sequentially easier.
At the later stages, you have to find the right person for a specific pain point or a goal that you have. The need to hire specialists is more important. One mistake that people make is waiting too long to bring on VPs. If you bring on a great VP, that will hopefully help recruit a whole another team to your company, and you’ll turn over the responsibility of building the infrastructure to make that team successful to a great leader.

A lot of open-source is about product-led growth, so it’s crucial to have a go-to-market team that understands how to nurture that and has people deeply familiar with your product. This team should consist of hybrid roles with customer success to figure out user needs and how the users can grow with the company (rather than sending outbound cold emails). Especially for technical products, you need to have a technical seller who can appeal to the technical needs of your buyers. So open-source has a pretty different go-to-market motion (compared to enterprise products), whether you’re talking about infrastructure or application layers.

On Open-Source Insights

I have built a low-code way to track open-source projects and Github stars for a long time on a small scale. It was an Excel spreadsheet updated weekly. I thought it might be interesting to open-source my work and share it with the rest of the community. So I invested in building a Python scraper, hosted it on GCP, and used Metabase as the wonderful frontend for people to interact with the data.

The motivation was that I wanted to share these insights with investors and entrepreneurs who care about these stats. However, there wasn’t a great place to get the information all in one spot. One insight I got is that there were still so many noises in Github. There’s not a key stat that you can point to that will lead to a great investment. But you can certainly track the momentum of contributor growth and star growth over time. More often than not, such momentum shows promise for companies.

On Joining dbt Labs

I really fell in love with dbt the project. At its core, dbt lets you transform your data, but it’s much more than that. I described it as a movement because it’s completely changing the analytics industry and the workflow of how analytics engineers can do their job.

What I love so much is that dbt is bringing best practices from the development world into analytics (documentation, testing, version control, CI/CD). These are now important pillars of the workflow in an analyst’s job, and dbt enables that. I think it’s completely foundational to any data stack. It’s where you define your business logic and ensure you have the right infrastructure to scale as your data needs get more complex. So it’s a non-negotiable for data teams today.

I am the product manager for dbt Cloud, which has two parts:

The Cloud IDE is a nice way for analysts who are less familiar with git to use dbt Cloud. It enables easy workflows where you can build your data model and collaborate with other analysts on your team.
The Cloud Scheduler enables orchestration and CI/CD capabilities for you to operationalize building and testing dbt models.

A nice thing about dbt is that we work in the open. Every quarter, we have a product event called Staging to discuss what we’ve been working on, what’s on our roadmap, and what you can expect from our team. Some big areas that we will focus on are to continue investing in the IDE and the Scheduler. But we are also exploring metadata, data quality, and discoverability areas.

On An Under-Invested Area In The Data Tooling Ecosystem

One of the biggest challenges in data is the natural tension between going fast and being right — which is a hard problem to solve. So it entails getting high quality and building proper infrastructure versus enabling self-serve analytics for those without technical backgrounds to work with the data. I’m hopeful that these worlds will be bridged more in the future.

dbt leans more towards the lens of “How do you make sure you can trust your data today and make the right investments there?” There’s a learning curve to bring dbt into any team, but the promise that we have is that you won’t regret it. People who couldn’t do data transformations before (because they are not engineers) now can. dbt brings accessibility and empowers people to achieve tasks they weren’t able to before.

I think that there is more which either dbt can do or other people can do, which is letting more people be data consumers. It’s a little bit of what Metabase does. There’s always the tension of “If you let everyone create data models for everyone, how can you trust the data? Do you have challenges with data quality? Are the foundations right?” We are certainly trying to solve it at dbt, and others recognize this hard problem as well.

On Interoperability Within The Data Stack

As an open-source product, dbt wants to be interoperable with other tools. We think about our open-source project as the standard for how you perform transformations and build data models. Lying at the core of that technology is the compiler, and we want it to be absolutely everywhere.

Our first principle is to grow the pie versus to gain share from our competitors. We think this market is absolutely massive, and we work hard to invest and contribute to the industry at large. Our goal is to move the industry forward. At the end of the day, if the analytics teams are more confused than when they started, that’s a problem for everyone. Making sure that there are clean handoffs between different layers of the stack is so important.

On Differences Between Being a VC and Being a PM

In product, it’s pretty fun because I have two lenses that I use frequently:

The first is my wide-angle lens: thinking about the industry and where we need to go as a team/as a product. How do we play in the broader ecosystem? How to make the ecosystem better as a whole?
The second is my zoom lens: thinking deeply about individual steps that we have to take to achieve our goals.

It’s that constant kind of shifting in scales between wide-angle and zoom that I have to handle in product. In venture, you really just have the wide-angle: thinking more at the company and industry levels, then making bets on the future. You kind of glaze over the details at times because they are not relevant. If you spend too much time on the details, you might forget to see the full picture.

That’s the biggest difference I’ve seen in the two roles. But it’s a fun challenge to be able to context-switch and exercise two different muscles that I have in product.

Because I’ve studied the data and analytics space for so long, the wide-angle lens is just so ingrained in how I think and constantly runs in the background. Subconsciously, I always think about the company's overall benefits and strategic goals, But I don’t have to work that muscle quite as much right now. The muscle I’m exercising more in my current day-to-day is the zoom lens — thinking at the molecular level and making decisions on a daily/weekly cadence.