Datacast

Episode 68: Threat Intelligence, Venture Stamina, and Data Investing with Sarah Catanzaro

Episode Summary

Sarah Catanzaro is a Partner at Amplify Partners, where she focuses on investing in and advising high potential startups in machine intelligence, data management, and distributed systems. Her investments at Amplify include startups like RunwayML, Maze Design, OctoML, and Metaphor Data, among others. Sarah also has several years of experience defining data strategy and leading data science teams at startups and in the defense/intelligence sector, including roles at Mattermark, Palantir, Cyveillance, and the Center for Advanced Defense Studies.

Episode Notes

Show Notes

(01:48) Sarah talked about the formative experiences of her upbringing: growing up interested in the natural sciences and switching focus on terrorism analysis after experiencing the 9/11 tragedy with her own eyes.
(04:07) Sarah discussed her experience studying International Security Studies at Stanford and working at the Center for International Security and Cooperation.
(07:15) Sarah recalled her first job out of college as a Program Director at the Center for Advanced Defense Studies — collaborating with academic researchers to develop computational approaches that counter terrorism and piracy.
(09:48) Sarah went over her time as a cyber-intelligence analyst at Cyveillance, which provided threat intelligence services to enterprises worldwide.
(12:22) Sarah walked over her time at Palantir as an embedded analyst, where she observed the struggles that many agencies had with data integration and modeling challenges.
(15:26) Sarah unpacked the challenges of building out the data team and applying the data work at Mattermark.
(20:15) Sarah shared her opinion on the career trajectory for data analysts and data scientists, given her experience as a manager for these roles.
(23:43) Sarah shared the power of having a peer group and building a team culture that she was proud of at Mattermark.
(26:41) Sarah joined Canvas Ventures as a Data Partner in 2016 and shared her motivation for getting into venture capital.
(29:47) Sarah revealed the secret sauce to succeed in venture — stamina.
(32:00) Sarah has been an investor at Amplify Partners since 2017 and shared what attracted her about the firm’s investment thesis and the team.
(35:28) Sarah walked through the framework she used to prove her value upfront as the new investor at Amplify.
(38:35) Sarah shared the details behind her investment on the Series A round for OctoML, a Seattle-based startup that leverages Apache TVM to enable their clients to simply, securely, and efficiently deploy any model on any hardware backend.
(44:39) Sarah dissected her investment on the seed round for Einblick, a Boston-based startup that builds a visual computing platform for BI and analytics use cases.
(48:45) Sarah mentioned the key factors inspiring her investment in the seed round for Metaphor Data, a meta-data platform that grew out of the DataHub open-source project developed at LinkedIn.
(53:57) Sarah discussed what triggered her investment in the Series A round for Runway, a New York-based team building the next-generation creative toolkit powered by machine learning.
(58:36) Sarah unpacked the advice she has been giving her portfolio companies in hiring decisions and expanding their founding team (and advice they should ignore).
(01:01:29) Sarah went over the process of curating her weekly newsletter called Projects To Know (active since 2019).
(01:05:00) Sarah predicted the 3 trends in the data ecosystem that will have a disproportionately huge impact in the future.
(01:11:15) Closing segment.

Sarah’s Contact Info

Amplify Partners’ Resources

Mentioned Content

Blog Posts

People

Sunil Dhaliwal (General Partner at Amplify Partners)
Mike Dauber (General Partner at Amplify Partners)
Lenny Pruss (General Partner at Amplify Partners)
Mike Volpi (Co-Founder and Partner at Index Ventures)
Gary Little (Co-Founder and General Partner at Canvas Ventures)

Book

“Zen and the Art of Motorcycle Maintenance” (by Robert Pirsig)

New Updates

Since the podcast was recorded, Sarah has been keeping her stamina high!

Her investments in Hex (data workspace for teams) and Meroxa (real-time data platform) have been made public.
She has also spoken at various panels, including SIGMOD, REWORK, University of Chicago, and Utah Nerd Nights.

Be sure to follow @sarahcat21 on Twitter to subscribe to her brain on the intersection of data, VC, and startups!

Episode Transcription

Key Takeaways

Here are highlights from my conversation with Sarah:

On Formative Experiences Growing Up

My mother is a psychiatrist and clinical researcher. My father is a molecular biologist. Given their backgrounds, there was a natural tendency to be drawn to statistics, mathematics, and the natural sciences. Growing up, I always felt that I was engaged in those disciplines: listening to my father explaining the biochemistry of diet coke, or listening to my mother explaining how a drug goes through clinical trial to get approved by the FDA.

But while I was in high school, 9/11 happened. I remembered it so vividly. I was sitting in the library. We saw the planes crashing into the Twin Towers on a TV. After a really tense and hectic day trying to contact my father, who was working in Manhattan, I was sitting around the table in the house and thinking: What would motivate other people to commit such atrocity? That question stuck with me throughout my college years.

“What could motivate people to commit such horrific acts of violence against civilians who they don’t know?”

As I went through college, I attempted to apply the more scientific methodology to answer that question so that we won’t just take a subjective and potentially biased approach to understand terrorist ideology and organizational dynamics.

On Working With CISC at Stanford

The thing I loved the most about Stanford is that they take a very entrepreneurial approach, even to education. They encourage people to explore different disciplines, understand the synergies of those disciplines, and grow the body of research within the disciplines.

CISC (the Center for International Security and Cooperation) is the perfect example of this phenomenon. On the surface, it’s an institute for international security and cooperation. You might think that the institute would be dominated by history, international relations, political science, etc. But in fact, many of the faculty staff at CISC came from engineering, management science, economics, mathematics, statistics, computer science, etc.

The underpinning idea is to apply these various disciplines to understand the really, really tough problems: why do terrorists commit atrocity? Why do nations engage in wars? These are hard, gnarly questions. We should use all the available resources to answer them.

On Startups and Terrorist Groups

Terrorists act like any other rational actors. Therefore, many methods that we use to study businesses can be used to study terrorist groups as well. I am constantly thinking about this in my new role as an investor, trying to understand startup behavior.

“I hate to draw the connection between startups and terrorist groups, but both have incentives to obfuscate information about their operations.”

Startups do not want to reveal too much competitive intelligence. Terrorist groups obviously do not want to reveal any plans about their operations. Given the need to operate in a clandestine manner but also to exchange information internally with their partners, how does that impact their behavior?

Approaches used to study business, such as network analysis and game theory, can be applied to this context. It was interesting to see that so many organizations, no matter what their objectives are, follow similar patterns.

On Countering Terrorism and Piracy at C4ADS

One big project is my first real exposure to AI as well. At the time, there was an institute called the Joint Forces for Finding Information Center. They engaged us in a contract to review all literature and synthesize insights on computational approaches to counter-terrorism. This ranged from looking at network analysis to agent-based simulation. Deep learning had not taken off at the time, as the AI winter was still in effect.

We had to think about how we could computationally and mathematically represent things like intent, hunger, fatigue, etc. We needed to understand how the military might respond in a battlefield situation. This was the first time I saw what we could do to understand, frankly, human phenomenon, leveraging computational and mathematical approaches.

We had a collaboration with MIT’s Center for Brains, Minds, and Machines group. They focused on computational linguistics, developing models to identify adversarial intent based on the utterances (potential terrorists).

On Threat Intelligence Analysis at Cyveillance

This transition to Cyveillance for me felt very much like going from academia into industry. Although C4ADS was officially not an educational institution, it was a think tank. Much of the work done there was research. At Cyveillance, I worked onsite at the Secret Services to immerse myself in the applications of my analytical work on the customers more.

That experience was very valuable. I learned how humans behaved in an adversarial setting when we tried to respond to potential threat actors. I also learned about bureaucratic dynamics. At Secret Services, as a defense contractor, I’m talking about a multinational organization in a highly regulated industry intersecting a large US government institution.

“Understanding how I could navigate that red tape while still advancing technologies and providing better tools & experiences to other analysts and field operatives is the crux of what I learned.”

On Being An Analyst at Palantir

While working at a large defense contractor deployed to a large government, I started to observe the startup impulse in myself. I got frustrated by the red tape and saw these opportunities to do better. But navigating those organizational hindrances was not just challenging; but in some cases, it was arguably impossible. I was drawn to Palantir and the promise of taking an agile/startup-like approach to drive innovation in the public sector.

Many of my initial deployments at Palantir was with state local governments. I learned not just about organizational dynamics and government requests, but frankly, also about data. Many of the challenges that our customers faced were not just pure analytical. There were challenges in integrating data from different sources, understanding what data was trustworthy, applying those insights into data quality, and bringing everything together to drive better analyses.

“It was illuminating to get exposed to the challenges that organizations might face with data.”

On Challenges as The Head of Data at Mattermark

I had two real challenges encountered at Mattermark that summarize my tenure there.

The first was the challenge of building a data team. In 2014, data science was still emerging as a discipline. It wasn’t immediately clear: for example, what the trajectory might be for a data scientist? What’s the difference between a data analyst, a data scientist, and an ML engineer? How should the data science team interact with the broader organization? Who should the data science team report to? So on and so forth. There was a lot that I had to learn and make up on how to build and scale a data team.

The second was the challenge of doing applied data work. Back in 2014, while there might have been interesting research on ML, there was less emphasis on using ML to build new products. We developed a strong capability and confidence in understanding what problems could and should be solved with ML. We developed a methodology that integrated both human work (internally and externally) and ML to deliver the best results. We innovated a multi-faceted system where ML plays a critical role, but we didn’t expect to address our full requirements with ML alone. We layered on data mining techniques, heuristics, crowdsourcing, and other approaches to generate insights from data (or otherwise transforming data) and deliver the best possible experience to Mattermark’s customers.

On The Career Trajectory For Data Analysts

My friend Drew Conway created the famous Venn Diagram that describes the data science role, which boils down to substantial expertise, math and statistics, and programming skills. In many ways, those components of the Venn Diagram define the career trajectories for a data analyst.

Among those who started as data analysts on my team, some really enjoyed the soft skills aspect of the work. For them, it’s important to understand how to define the problem, scope the problem, and think about the broad possible approaches. The technical product management path is best suited for their interests and skills.
For those who enjoyed the modeling work (whether statistical or mathematical), there’s a natural path to go from data analytical work to data science work.
For those who enjoyed programming, it was natural to explore opportunities in data engineering.

I saw the data analyst role as a branch, where one could go into these various other trajectories. I have also been thinking a lot about why those branches need to split and if there is a trajectory for a data analyst who does not need to transition into a new role.

“Thinking about my own experience and where I am today, the other path (that I didn’t necessarily think about during my time at Mattermark) is such that it might be natural for a data analyst to become a VC.”

It might be natural for them to cultivate more and more domain expertise — not just in defining and scoping problems but in understanding the dynamics of a startup ecosystem — and become domain experts with analytical skills (an expert analyst). This career trajectory is and should be more available to more people in the field.

On What Enabled Her Success at Mattermark

When joining Mattermark, I had never really worked outside of the defense intelligence sector before. I had never managed a team before. I had never considered myself a data scientist before. This was all entirely new territory to me. Two things truly enabled my success:

The first was that I had a strong peer group that was going through similar things. In 2014, nobody knew how to manage a data science team or structure the team to maximize ROI. We created forums where we could have valuable discussions and related to one another, almost like a support group. That network was incredible, and frankly, one that I relied upon even today.

The second was more personal in terms of team building. My biggest accomplishment at Mattermark is the set of people that I recruited and the things that they accomplished.

“Given all the uncertainty about organizational dynamics and the rollercoaster of the startup life, loving my team and investing in their successes motivated me to overcome any obstacles.”

To me, that was fulfilling, inspiring, motivating — giving me the joy to power through some of the harder aspects of the job.

On Getting Into Venture Capital

In many ways, my career has been impacted and driven not by a clear sense of trajectory or outcome but rather by chance. About two and a half years into my journey with Mattermark, one of our customers reached out to me and asked: “Have you ever considered doing venture investing? You had the experience of working with startup data to help other investors make investment decisions. Is that something you want to do yourself?”

“Throughout my tenure at Mattermark, I have been fascinated by the venture job and thought that it could be something that I want to do later on in life. In this role, I can study startup organizations, understand their expansion/contraction, and learn what levers are available to accelerate their growth. This reminds me of my time in counter-terrorism and counter-insurgency.”

So when that opportunity became available on a more accelerated timeline than I could imagine, I jumped at it. VC is hard to break into. Given this golden ticket into a venture job, perhaps I would be naive to pass it up.

Reflecting on my time at Mattermark, a lot of it was about cultivating my team. The other way the stars aligned was that: there had been an analyst who subsequently became a data scientist doing product management work and grew rapidly throughout his tenure as my report. When the VC offer became available to me, it seemed like a juncture in which he was ready to step up into my role.

Again, the stars were aligned because there was an interesting path for me to explore and a natural leader who could fill my shoes incredibly.

On Venture Stamina

People have a big misconception that VCs spend their day wining, dining, and flippantly taking investments. I had never worked as hard in my life as I have in venture. So I’d say that venture is not an easy job.

Often people asked: “What does it take to be successful in venture?” To some extent, you need to be smart and motivated, have certain soft skills, etc. (a lot of what is required to succeed in any profession).

“Frankly I think, what it really takes to succeed in venture is stamina.”

You need to be always on because any interaction could potentially be a future investment opportunity or opportunity to help facilitate a customer/candidate relationship for a startup.
You need to be always aware so that you understand the trends that are happening in an industry and know what companies are forming/might be forming.
You need to be comfortable with uncertainty. Given all the balls that you are juggling in the air, some of them will inevitably drop.

The thing that surprised me the most is just how much stamina is needed to succeed in venture. In many ways, the ones who have succeeded in venture stayed with it the longest. There’s also a critical element of chance.

On Onboarding Amplify Partners

There are three aspects of Amplify’s investment thesis that appealed to me.

The first is an obvious one: Amplify invests in technical tools/platforms. More specifically, Amplify covers three academic disciplines that translate into three industry markets: data/machine intelligence, data management, distributed systems -> data/ML tools/platforms, enterprise infrastructure, and developer/designer tools.

During my time at Canvas, I noticed that I wasn’t in love with general investing. I did not have much passion for consumer products or marketplaces as I did for data tools and platforms. To be honest, it was challenging to become an expert across these various disciplines. Not only did I not have the passion for these other startup categories, but I also was not good at it.

I realized that I wanted to focus on investing in the data stack. In that respect, Amplify was the perfect partner. I did not have to scale up and learn about the travel sector or the market of interior design products, as my role at Amplify specifically focuses on the data stack.

The second one is the stage focus. At Mattermark and other previous roles, I helped companies grow from 0 to 5 / A to D, perhaps on their data journey. My expertise was building data teams and thinking about data as a product, not necessarily the scaling aspects (such as optimizing marketing campaigns or structuring sales salaries to get the most efficiency out of the sales team) that generally come with Series B/Series C investing.

The third one is the team. Throughout my career, it’s not money or power that motivates me. It’s winning as a team that galvanizes me.

On Proving Value Upfront As a New Investor

When I first joined Amplify, I spent a year looking at domains like computational biology, IoT, and even space. While learning about these domains and how data science & ML could impact them was intellectually satisfying, I wasn’t adding exponential value to Amplify. With each new sector that I looked up, I had to ramp up, maybe made an investment, and then moved on. At one point, I started thinking about my prior operating years and how I could feel more fulfilled in my job as an investor.

Looking back on my entire career trajectory, I had the epiphany that the best way to serve my team at Amplify is to invest in the tools that I wished I had. This is an awesome way for me to marry the commitment I have for the data science community with my commitment to my team at Amplify.

“I quickly noticed that this is a superpower that I could leverage. I wasn’t learning about the data science market. I lived through it. I wasn’t trying to infer how data science teams operate and the challenges/motivations that they have. I had done it.”

The combination of having that insider intelligence and the network that I relied on to help understand how to grow a data team within an early-stage startup gave me an unfair advantage. For that reason, I have focused on the data stack at Amplify for nearly three years now.

On Investing in OctoML

This investment crystallizes our desiderata when evaluating potential deals. When people ask what I look for in a startup, I often say: “people, people, people, product, market.” OctoML perfectly embodies that paradigm.

I met Jason Knight (the Co-Founder and Chief Product Officer at OctoML) about 5 years ago. At the time, he just transitioned into a new role as an ML engineer at Intel Nirvana. I remembered my conversation with him about ML compilers. He has a unique ability to switch between a technical discussion and step back to think about how an investor might think about ML compilers. He was able to show a strong sense of empathy that so few people have. Over the following 3 years, Jason and I met up at various ML conferences and talked about trends related to the confluence of AI hardware & software.

Separately, I spent time in academia to understand how certain research patterns might impact the industry. I met Luis Ceze at the University of Washington to learn more about the Apache TVM project. Like Jason, Luis has a magnetic way of making a really complex technology such as AI hardware so approachable.

When Jason and Luis told me that they were forming a company together (along with 4 other Ph.D. researchers from the University of Washington), it was an obvious bet to make at that point.

We also considered the product and market. My belief is that ML will have the greatest ROI when it moves closer to the edge. There are immense opportunities for on-device ML. As we get better at MLOps, ML will shift further to the edge. But for that to happen, we need to be able to make inferences at the edge. That’s hard at the moment.

“Thinking about a tool that enables any model developer to deploy any model on any hardware backend was really compelling to me.”

Lastly, not every data scientist is going to cultivate expertise in full-stack infrastructure. Fundamentally, I believe that we need a tool such that model developers can build models and not think about the hardware. I think of OctoML as the Linux of ML.

On Investing in Einblick

I came across Einblick through the dimension of the product first.

Having managed data analysts, data scientists, and ML engineers, I saw a proliferation of tooling focused on improving data science and ML workflow. Providing better tooling for those roles helps improve their productivity and extend their skillsets further — ultimately generating organizational value. However, analyst tooling was not changing. Looking at BI tools in the 70s and Tableau/Looker/PowerBI these days, while the interfaces might have changed, the fundamental capabilities and the workflow that can be executed were not much different.

“We started with the hypothesis that analysts need tools that enable them to do new things, not just existing things better or with bigger datasets.”

We came across the Northstar project, a visual computing platform developed at MIT with that thesis in mind. The goal of the project was to combine the technology from the database (approximate query processing) world, the HCI (touchscreen interfaces) world, and the ML world (various AutoML tools) to give analysts new power. I love Einblick’s focus on empowering the analysts without requiring them to learn a new set of skills.

We looked at the team as well. In order to build this product effectively:

To do visual joins at light speed, that requires database expertise.
To build lightweight predictive models on the fly, that requires ML expertise.
To develop new creative approaches to interfaces and create the right visual metaphors (for things like what-if analyses), that requires HCI expertise.

The Einblick team has an elegant confluence of all these expertises. They also have the strong will not just to build things that will make for interesting papers, but those that will enable new behaviors. That’s a commitment to real user experiences, not just academic research.

On Investing in Metaphor Data

This is a perfect example of a tool that I wish I had.

I can’t tell you how many data dictionaries I had to build in almost all my roles. This is important for me to have context on what data is available and define potential data/ML projects. It’s also important for the broader organization to know what questions to answer for the business.

But it’s so tedious. It was hours of looking through GitHub repos to understand the code that generates certain datasets. It was hours of creating dashboards so that I could understand the timeliness and completeness of those datasets. It was hours of coaching the organization to understand these various datasets and responding to questions about those datasets. It was worth it though!

“What excited me about the metadata management category is that you have something that is really hard, tedious, and painful to do, but also so worth it.”

In addition to data discovery, I also saw and felt the other adjacent pains, such as change management and data governance. All of them can be solved with powerful metadata management capabilities.

Naturally, I met with probably 50 companies working on data discovery, data catalog, etc., and 100+ former colleagues/friends struggling with this problem.

Based on those conversations, I believe that any metadata management platform has to be push-based. Trying to ingest all data sources in a pull-based manner is not sustainable at a time when the number of data sources was proliferating.
I also saw the field shifting slowly but increasingly towards streaming. A metadata management platform that can fit into the streaming data architecture would be the best tool.

After meeting the Metaphor Data team and learning more about DataHub, I observed a team that was so passionate about metadata. If Mars or Pardhu told me that they spent 23 hours per day thinking about metadata and an hour per day thinking about their families, I wouldn’t be surprised. This is the key motivating problem that they want to spend the rest of their lives on.

“Having a team that I believe in, a product with the right set of requirements, and a category that is important now and will be important in the future inspired the investment.”

On Investing in RunwayML

I believe that data science and ML can and will impact other roles. What’s inspiring about Runway is the notion that data and ML can impact creative professions. Initially, it enables filmmakers, effect artists, and others to play with ML. Then it enables them to experiment and prototype with ML. Then, it builds ML into their tools such as they are more comfortable with the technology. This gradual approach of introducing ML into the workflow of non-data practitioners is frankly the right way to socialize ML in new roles.

“I often think about ladders of abstractions: how can we create the right set of interfaces and technologies such that people can engage with a new computing and analytical paradigm in a way that they feel most comfortable? Runway really embodies that.”

Some people will scale that ladder. Others will stay at just one ladder. In general, the above approach enables Runway to unlock the opportunity to radically transform the way creatives go about their work.

Runway made ML fun and exciting, but also powerful. Nailing that is hard. Coming from the art world with a technology background, the founders have been able to deliver such a compelling product.

On Advice For Her Portfolio Companies

“The advice that they should ignore is that there exists a formulaic approach to building their teams. Building a great founding team starts with a strong sense of self-awareness and direction.”

As a founding team, you will be good at many things and great at many things. But there will be gaps. Understanding the gaps between those you are not good at, those you are good at, and those you are great at is critical when building a founding team. In many ways, building a founding team is an exercise of layering on greatness, which will be different for every company. Founders having that sense of awareness and letting that inform the hiring plan is a critical capability.

The other thing is having a sense of mission and direction. You will not always be right about what you need to build or what set of skills will match that. The big challenge of hiring is that you often need to recruit for roles before you have that clarity. But I think you do need to have some hypotheses. If you aren’t constantly testing your product hypotheses about the path from MVP to the ultimate vision, then it’s easy to hire the wrong people. The mix of knowing what you’re good at and knowing where you’re headed really informs early team-building.

On Projects To Know

Back in 2019, I was subscribing to a lot of ML and data science newsletters. My frustration was that those tend to fell into one of these two buckets:

The first bucket is the soundbite-esque that focuses on ML stories with more of a public interest angle. These are important stories but already well covered.
The second bucket focuses purely on research. They include a bunch of research papers that perhaps help readers stay up-to-date with the innovation.

“There wasn’t anything great tailored for practitioners: content, research, and projects that they could use on their day-to-day.”

That’s why I started (and hopefully continue to go) with Projects To Know. It highlights news, papers, and projects that will help practitioners do their job. They might not have the public interest or might not be the most exciting research trends but are better practically important.

On Trends In The Data Ecosystem

Three trends will have a disproportionately huge impact in the future.

1/ Streaming Data: The world is better represented as a stream. We do not make sense of batch data. We make sense of signals coming in constantly. Streaming datasets are a high-fidelity representation of the world. Why don’t we use streams for everything? The answer right now is that it’s hard and complex. What I am starting to see and excited about are technologies that will make streaming data easier.

“I think streaming data will change the way we do everything from doing analytics to training/deploying ML models.”

2/ Statistical Methods: Organizations increasingly benefit not just from ML-driven product development, but also through better analyses. This might come in the form of forecasting, experimentation, or churn analysis. Much of these analyses require not just the understanding of ML but, in fact, the fundamental knowledge of statistics. As a result, I see more of a trend towards not just the democratization of ML but also the democratization of statistical analyses.

3/ MLOps: ML is very much stranded in research. Some companies might attempt to productionize research, but it’s very challenging. You either have to carefully select the ML projects to work on or cross your fingers and hope for the best.

“As it becomes easier to operationalize ML models (better tools for monitoring, maintaining, and understanding the models), it’ll become easier to take a more agile approach to ML product development — trying more things and seeing what works. That will help us understand the high-ROI use cases for ML and unlock even further innovation in the ML stack.”

I am not a strong believer in end-to-end AutoML. There’s an immense amount of value in human/manual work, including addressing data/models limitations and how your team works together. Even though parts of the ML development and deployment processes can be automated, frankly:

It will be very hard to develop end-to-end AutoML systems that are robust, reliable, and debuggable.
You will compromise on certain aspects of organizational efficiency, even if those systems are state-of-the-art.