Datacast

Episode 66: Monitoring Models in Production with Emeli Dral

Episode Summary

Emeli Dral is a Co-founder and CTO at Evidently AI, a startup developing tools to analyze and monitor the performance of machine learning models. Earlier, she co-founded an industrial AI startup and served as the Chief Data Scientist at Yandex Data Factory. She led over 50 applied ML projects for various industries - from banking to manufacturing. Emeli is also a data science lecturer at St. Petersburg State Management School and Harbour.Space University. She is a co-author of the Machine Learning and Data Analysis curriculum at Coursera with over 100,000 students. She also co-founded Data Mining in Action, the largest open data science course in Russia.

Episode Notes

Show Notes

Emeli’s Contact Info

Evidently AI’s Resources

Mentioned Content

Blog Posts

Courses

People

Book

New Updates

Since the podcast was recorded, a lot has happened at Evidently! You can use this open-source tool (https://github.com/evidentlyai/evidently) to generate a variety of interactive reports on the ML model performance and integrate it into your pipelines using JSON profiles.

This monitoring tutorial is a great showcase of what can go wrong with your models in production and how to keep an eye on them: https://evidentlyai.com/blog/tutorial-1-model-analytics-in-production.

About The Show

Datacast features long-form conversations with practitioners and researchers in the data community to walk through their professional journey and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths - from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing khanhle.1013@gmail.com.

Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:

If you're new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

Episode Transcription

Key Takeaways

Here are highlights from my conversation with Emeli:

On Studying Applied Math in College

When choosing universities, there were pretty much two options for me. The first option was to go to a top technical university in Russia, such as St. Petersburg State or Moscow State University. The second option was to enroll in a university with a great international connection and many students from different countries. I am from a multi-cultural family (my father is African, and my mother is Russian). I was very interested in meeting people from different cultures, so I went for the second option. This is my best decision in life.

The Peoples’ Friendship University of Russia is not a technical university, but it has many nice mathematical courses. As a result, I got a solid mathematics background but lacked modern computer science courses. I compensated for that by enrolling at Yandex School of Data Analysis.

My favorite Math classes are all connected to statistics because I love the idea of making decisions based on data. Notable courses included probability theory, introductory statistics, and financial mathematics.

On Getting Into Computer Science

During the third year of my bachelor’s degree, I started to think about my future jobs. I searched for different options from different companies. Finally, I decided that I need to go to Yandex because it’s one of the best IT companies in Russia. I figured that I lacked experience in computer science and programming because I didn’t participate in any programming competitions or work on any programming projects. Therefore, I started to think about getting more experience and becoming a better engineer to at least have a chance within Yandex.

I figured out that they have some Master’s programs, so I prepared really, really hard for it. I spent three months studying whatever I needed to attend school. This is a competitive program to get into, and I thought I had a tiny chance because I was not from the best technical university. But I decided to try anyway. It worked out for me. I was super happy. This is my second best decision in life.

In 2010, Yandex had only two programs: one on data science/analysis and one on computer science. I thought that I needed to focus on the engineering courses, so I went for the computer science program.

On Working as a Software Developer

This first job working at Rambler was very stressful for me. I wasn’t sure that I was good enough to start my career as a software engineer. My previous internship has been in data analysis, and I was worried about my engineering skills. My first big task was to work with distributed systems. I had to learn how to write MapReduce jobs, write Hive queries, use distributed filesystems, and other similar things. I was so scared that I would break something. 

Eventually, I figured out that it’s okay to learn on the fly while working. If I want to stay in this profession for a long time, I need to be comfortable with this learning process.

I still remember everything that I have learned about distributed computation. In the beginning, I was really scared to run distributed MapReduce jobs. If something went wrong, the whole cluster would break, and everybody would know that’s my fault. But later on, that process became more experimental, and I enjoyed it.

On Building E-Commerce Recommendation Systems at Yandex

After finishing Yandex School of Data Analysis, I knew that Yandex would be my home company. I love the Yandex culture; even while working at Rambler, I still communicated with people from Yandex. So when they went through challenges and problem statements in recommender systems, I was so happy to join.

There were many expectations for Yandex’s system: the response time should be short, and the solution should be stable enough to be passed by the engineering team. This was when I learned how to write production-grade code and good tests, create a stable system with many fallbacks, and design a nice database schema.

I also learned about the straightforward connection between the quality of ML models and business KPIs. It’s vital to know how the solution impacts real users and aligns with the right metrics.

On Applied Machine Learning at Yandex Data Factory

I was super proud that I got invited to join Yandex Data Factory. It was a new department focused on applied ML for different businesses. They invited many experienced data scientists from inside Yandex. I was not sure that I had been good enough to join the team, so I wanted so much to prove that I was capable.

In my first project, we worked with one of the biggest communications companies in Russia.

The second project was more interesting from a business point of view.

On Challenges in Industrial AI

Industrial AI differs from online services significantly. While working for any online company, you have access to user-generated data. Events are associated with users and can be aggregated by user IDs, making it easy to create features, build training datasets, and proceed with the ML lifecycle.

When it comes to manufacturing, you don’t have the data associated directly with the objects.

On Making Coursera Courses

This happened when Coursera went after the Russian market. I was invited to work on the “Machine Learning and Data Analysis” Coursera specialization by Konstantin Vorontsov, an ML instructor from Yandex School of Data Analysis and a faculty at Moscow Institute of Physics and Technology. I prepared seminars with hands-on Python programming and applied projects.

It’s important to have thick skin. When you publish something on the Internet, you should be ready to get feedback.

On Founding Evidently AI

At Mechanica AI, our production ML system needs to work correctly. If something fails in production, money would be lost. Therefore, we set up the monitoring schema from scratch for every project because there are various parts of the solution that we need to pay attention to.

I figured out that there was no general solution for the ML monitoring system. So, I thought, why not try to build a product that can look after ML models in production. That’s how Evidently AI was born.

On Model Monitoring

Most companies start monitoring their ML models only after their first huge break. I think that’s a mistake because they should prepare for such a scenario and monitor models from the beginning. It’s also important to understand that ML-based service is different from other services. Thus, it’s crucial to monitor service health, response time, memory/GPU usage, etc.

We have a data layer that is part of the solution, so I would even say that it’s most likely due to the input data when something is wrong with your models. Therefore, it’s even more crucial to monitor the input data. It’s important to analyze your specific case to determine where your models can break and use the appropriate monitoring strategy.

When it comes to data quality and data integrity, many things can happen to your models.

Issues like data drift and concept drift occur when the exact features and targets change. If your models aren’t prepared for that, they will degrade or break.

On Open-Source Roadmap

My co-founder, Elena, and I have discussed open-source a lot. It’s hard to be a perfectionist because we are building in public. If something doesn’t work, then everybody is going to see it. So it was a hard decision for us.

But when you build something in public, you have feedback very early and can test hypotheses fast.

When you build a monitoring system to evaluate other systems, it’s better to understand how it actually works, what problems it accounts for, and what the weaknesses are. Real-world ML systems impact people’s lives directly (healthcare, finance, social services). No one actually has the whole picture of all possible problems that can happen. When we build a monitoring tool in public, we can aggregate the experiences from many different engineers and business specialists. For us, that was the biggest reason to build Evidently AI in public.

I have faced monitoring ML models in production at Mechanica and Yandex Data Factory, so we prioritize those issues and base our roadmap on top of that priority. I also spent a lot of time talking with potential users from various companies. As a result, we figured out that data drift and concept drift are more top-of-mind issues for them. Therefore, we prioritize our roadmap even more on top of user feedback.

On The Data Community in Moscow

The data community in Russia is large and young. We have a lot of young data scientists who were just finishing school. They are very active and enthusiastic.

Another fun fact about Russian data scientists: We are really good at implementing things, even if we reimplement bicycles. We like to reimplement things ourselves, and sometimes that’s not very efficient.