Datacast

Episode 67: Model Observability, AI Bias, and ML Infrastructure Ecosystem with Aparna Dhinakaran

Episode Summary

Aparna Dhinakaran is the Chief Product Officer at Arize AI, a startup focused on ML Observability. She was previously an ML engineer at Uber, Apple, and TubeMogul (acquired by Adobe). During her time at Uber, she built several core ML Infrastructure platforms, including Michelangelo. She has a bachelor's from Berkeley's Electrical Engineering and Computer Science program, where she published research with Berkeley's AI Research group. She is on a leave of absence from the Computer Vision Ph.D. program at Cornell University.

Episode Notes

Show Notes

Aparna’s Contact Info

Arize’s Resources

Mentioned Content

Blog Posts

People

Book

New Updates

Since the podcast was recorded, a lot has happened at Arize AI!

About The Show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.

Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

Episode Transcription

Key Takeaways

Here are highlights from my conversation with Aparna:

On Her Undergraduate at UC Berkeley

I originally went through the Electrical Engineering and Computer Science programs because I really liked Math. I was looking for applications of Math that would be interesting majors. UC Berkeley’s combined EECS is the ideal one.

I did an even split between Electrical Engineering and Computer Science, but I got really interested in the Machine Learning courses at Berkeley (such as CS189, EE126, EE127, etc.). I was also working on research projects focused on ML. UC Berkeley exposed me to the growing interest in AI and all the applications people can build using AI.

UC Berkeley has research professors who work on different projects in which undergraduates can get involved with. Sometimes you can walk up to the professors and demonstrate your interest to contribute. Sometimes graduate students can take you on as an undergraduate researcher.

One professor I worked with was Alice Agogino, who heads up the BEST Lab at Berkeley. We worked on a lighting system on a smart grid, figuring out how to optimally place the lightning sensors in order to replicate smart grid optimizations. We used ML algorithms (even in 2013) for this task. I enjoyed this whole process of working with a research team and publishing a few papers. This project contains the implementation of both hardware and software components (Alice is a Mechanical Engineering professor).

On Academic ML and Real-World ML

I enjoyed a good mix of doing research and completing real-world internships.

During my summer at TubeMogul, I worked on the click-bot detection algorithms for fraud campaigns. It was my first time working with production systems. We spent a lot of time iterating on different algorithms and deploying the final version by the end of the summer. Building the algorithms took about a month while putting them into production took the remaining months of my internship. I learned that most of my time isn’t spent building the algorithms but rather on bringing them into the real world in a scalable manner.

While working in a research environment, I did not think much about these aspects. For example, another research project I worked on at Berkeley was through the Hybrid Systems Lab headed by professor Claire Tomlin. We attempted to build a hybrid framework for multi-vehicle collision avoidance. We spent a lot of time looking at various papers and statistical approaches. We had to prove that vehicles won’t collide.

In research, the papers and proofs are the outputs. In the real world, you have to bring the applications you’ve built into the hands of customers.

On Being a Software Engineer at Uber

After graduating from Berkeley, I debated between going towards a Ph.D. program or joining the industry. I opted for the latter option.

Uber was an incredible place to work right after graduation. I’d recommend this to anybody who just starts their careers. At large organizations like Google or Facebook, most of the systems have already been built, so you would probably work on optimizing them or keeping them running. At mid-stage startups, you would build new systems and launch them from scratch, which is a precious experience.

I joined Uber’s Marketplace org. Initially, I worked on a couple of core ML product teams — applying ML to the products that Uber was deploying. Later on, I joined a new team that was building a platform called MLaaS (later known as Michelangelo). The platform holds all the models that are fetched from a model store and then served/deployed in production. We had finance, EATs, marketplace, and more models on that platform. The system's design is complex because of the different model types: neural networks, gradient boosted trees, regressors, etc.

When storing a model, how can we build a general framework that can handle such a variety and complexity (in terms of hyper-parameters, training data, and architectures)?

Another challenge is to scale the platform to Uber’s level. Every year, Uber onboard more rides, meaning more models are deployed. So we had to think about designing the platform to withhold such a growing scale.

Those are the two main challenges: the upfront system design and the ongoing maintenance work.

On Model Monitoring at Uber

In ML development, the data scientists develop the models while the engineers put them into production. Once the models were put into production, we always noticed issues. Given Uber’s dynamic marketplace of riders and drivers, we got real-time feedback on product metrics. Given such metrics, we attempted to troubleshoot why things worked/did not work manually, like constructing dashboards and reviewing one-off scripts. Data scientists would pull things from production systems to understand what were the issues, when was the last time the models were retrained, did we launch a new model or pipeline that could impact the whole system, etc.

What initially sparked my interest in monitoring is the intuition that there gotta be a better way to deploy models. If we look at software systems, tools like Datadog and PagerDuty enable you to write tests and alerts to monitor the software performance. Nothing like them existed for ML.

ML was still a research discipline, not a well-oiled machine the way software engineering is.

We are still trying to figure out how to build a good ML engineering system to gather insights into the models and know when to improve them.

On Researching Bias in Computer Vision

I applied to Cornell University’s Ph.D. program a while back and got an offer. I wanted to give the Ph.D. experience a shot. After working for a couple of years, I thought that it’d be a good time to transition back into academia.

During my time at Berkeley, the last research group I was a part of focuses extensively on robotics and autonomous vehicles. I thought that the core issue of why there won’t be more intelligent automated devices was technical challenges in computer vision — not only being able to see but also to perceive the world. As humans, we have a lot of contextual clues to understand the scenario. These were still not there for computer vision applications. I thought that was the big frontier we still had to learn about and improve upon. That’s another reason I decided to join the Ph.D. program and worked on the exciting space of computer vision.

I joined Cornell and worked under Serge Belongie, also a Berkeley alumnus. I was at Cornell very briefly. In computer vision, there was a lot more interest in making sure that the models are not discriminatory/biased. Cornell has an excellent AI ethics program with incredible people like Rediet Abebe. They did a good job of asking questions about bias as part of the research process.

We already live in a society that is not equal. Even the type of people who go into a Ph.D. program typically does not look like me.

We have these inherent systemic inequalities. We also have the historical biased data used to build the models. Are we going to end up building models that continue to propagate the system of inequality in the world? That felt like a huge problem to me. We have no idea what the models are doing. And they can make critical decisions about hiring or financial opportunities.

On Creating MonitorML

Before thinking about model bias, we had no way of getting insights or visibility into the model. Forget about fairness. We could not even ask a simple question: Is this model working, or how is the model performing.

The day before the YC application closed, I saw a LinkedIn ad saying: “Hey, apply to this batch of YC.” I thought it would be interesting to apply and see if this idea of better monitoring ML could be a potential product. So I put together an application. At the time, my brother had been listening to what I was doing research on, so he was also on-boarded.

We ended up getting into YC.

I would recommend that experience of going through YC for any first-time founder ever. As someone with a technical background, I would call it a 3-month MBA program with hands-on doing versus reading about what others did. We would learn about well-known tech startups and what they looked like when they were doing YC (what they focused on, what metrics they optimized for, how they built and launched products quickly to get usage and traction, etc.). Having that group of ex-founders (who were constantly advising) and current founders (who were going through the same experience) was incredible.

On Co-founding Arize AI

I met my now co-founder Jason, who also faced a lot of the same pain points: not knowing models are working as expected and not knowing if models in production are going to work the same way as seen in the research environment. Jason also founded TubeMogul (where I interned) and took it to IPO.

As the two of us caught up and discussed the space, we noticed a lot of similarities. So we figured that it would be best to join both of our experiences together. Ultimately, MonitorML was acquired by Arize AI.

It’s been a crazy experience, actually: going through YC, teaming up with Jason, and now having a full team of folks working on ML observability at Arize. But, for me, the main rationale was finding a really incredible co-founder.

On Model Observability

The statement that we made, model observability is the foundational platform of the core ML stack, is my fundamental belief. Maybe two to three years from now, no one will deploy models in production and fly blind, not knowing why issues happen.

The big divide today is that data scientists build models in an offline environment. Then, when we have to operationalize these models, we have to think about all of the things we do offline (feature encoding, feature transformations, feature imputation, etc.) and how to make them work in real-time and at scale. Then, once the model is deployed (either on its own service or via an MLOps platform), how do we ensure it is still working reliably?

We won’t get to the point where we continuously deliver ML as part of the software stack if we do not treat ML as an engineering discipline.

It is not done after you build the model. It is done when the model is in the real world. So how can we build a feedback loop to validate things quickly and retrain the model promptly?

The key components of an ML observability platform are the tools that can analyze model performance in production the same way in an offline way. This entails monitoring and alerting to capture data quality issues, performance issues, distribution changes in the data, etc. You also need tools that troubleshoot and explain when things are not going well (the equivalent of Datadog, Splunk, and New Relic in software).

On The ML Infrastructure Ecosystem

The ML infrastructure space is very exciting right now!

It took us almost 20 years to define the current software stack. Now we are doing it all over again with machine learning. This includes everything ranging from what are the best tools for data preparation all the way down to what we should use for observability. The tools in the last 5 years or so have really focused on preparing data and building models. I think in the next 2 to 3 years, we will see a lot more focus on what’s known as “MLOps” tools, those that help you operationalize models and actually put models in the real world.

Furthermore, in the last few years, many ML platforms have been end-to-end (H2O, Dataiku, DataRobot). There are definitely use cases for them, especially those that gear towards less technical folks. However, they are not the most flexible tools. In the early innings, they help you go from A to Z, so people have gravitated towards them.

But as people become more specialized and want more configurability, the more vertical-focused platforms that focus on specific functions or personas will arise. Examples include Tecton for feature store, Weights and Biases for experiment tracking, and Algorithmia for model serving. These verticalized solutions will become the go-to options for ML teams instead of the big end-to-end platforms in the past.

On Finding Initial Customers

The first thing is to figure out who has the pain points that you are building towards. Of course, as a founder, you might have faced these pain points yourself. But who is going to be the core community that really identifies with that pain points as well?

You want to find the design partners or lighthouse customers. These people feel the pain points so much that they are willing to try the early product, which obviously won’t have everything. You want to find people who feel the pain points so deeply and solve the pain very well, such that they forget about all the things that your product does not have.

So as a founder, how do you find these design partners you can use to ground yourself and build a product that solves their pain points really well? Then you can use those insights to build the moat around your product and nail down the use cases.

A lot of that is hustle: You have to go out and talk to as many prospects as you possibly can. Make sure that you have materials, mocks, or MVPs that they can play with.

On Hiring

A valuable hiring lesson that I learned is that there are amazing people who are really excited about the space. There are also people interested in learning about the space.

I have been reaching out to my network, speaking about the pain points that I experienced, and getting them to understand what the pain points that I tried to solve are. As a founder, your constant job is to convince people about the company’s focus, mission, and the big picture.

Arize is a Cal Berkeley-heavy team. Recruiting from UC Berkeley has been a big success for us.

On Participating in The Amazing Race

While I was at Uber, I ended up being in the Amazing Race Season 32. It was a once-in-a-lifetime experience. My brother and I went to an open casting call and auditioned for it. We ended up getting cast.

During the filming of the show, we couldn’t really prepare too much for it. We were on the ground, went to any clue boxes, got the clues, and performed the challenges. Some preparation was useful — being fit, knowing geography, speaking a couple of languages, etc. But a lot of it was quick decision-making on the road and being able to trust teammates.

There are a lot of commonalities between participating in the race and running a startup. You have to think about your company’s strengths and how you can tackle this problem in a way that no one else can. Again this drills down to being able to hustle and execute, not being afraid of failing.

That mentality of anticipating the crazy adventure ahead is what’s common between a reality TV show and a startup journey.