Episode 70: Machine Learning Testing with Mohamed Elgendy

Episode Summary

Mohamed Elgendy is a seasoned AI expert, who has previously built and managed AI organizations at Amazon, Rakuten, Twilio, and Synapse. In particular, he founded and managed Amazon's computer vision think tank. He is the author of the "Deep Learning for Vision Systems" book published by Manning in November 2020. Mohamed regularly speaks at many AI conferences like Amazon's DevCon, O'Reilly's AI, and Google's I/O.

Episode Notes


Mohamed’s Contact Info

Mentioned Content




My conversation with Mohamed was recorded back in March 2021. Here are some updates that Mohamed shared with me since then:

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. Get in touch with feedback or guest suggestions by emailing

Subscribe by searching for Datacast wherever you get podcasts or click one of the links below:

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

Episode Transcription

Key Takeaways

Here are highlights from my conversation with Mohamed:

On Studying Biomedical Engineering in Egypt

Originally from Egypt, I joined an engineering-oriented university after high school. After the first year, I was thinking between either computer engineering or biomedical engineering as my major. I picked biomedical because it focuses more on the applications. It is basically the combination of software and hardware engineering in the medical field.

I had a great time learning how to think like an engineer. I went to the operations room with physicians, looking at how they were doing things and figuring out ways to help them better diagnose a treatment problem. In retrospect, this experience helped me a lot when I came back to the hardware work.

On Moving To The US

After college, I worked for about 3 years in Egypt. As an engineer who likes to build things, I was not content with the medical equipment field in the Middle East. There was much less innovation on the building and manufacturing side, while much more on the selling, support, and maintenance side. I realized that if I want to be on the innovation side, I needed to leave this region and moved to the US.

During my early years in the US, I focused on getting into the “system” (immigration, logistics, employment, etc.). Instead of pursuing a Master’s in Engineering, I decided to jump into the business side by getting an MBA. This aligned with my goal of looking at the bigger picture of things, not just heads-down building things. After finishing my MBA, I worked in a few jobs with the mentality of how much I was learning and how fast I was moving. At the time, the fastest way to get into the employment system was via software engineering roles. I wasn’t picky about the industry, but the roles that I got happened to be in the medical field.

On Becoming A First-Time Author

My process of learning new things entails building and sharing. Building can be building a product, writing an article, or creating a YouTube video. This encourages me to (a) have a goal for the learning experience and (b) go back to the learning process and fill in the missing details. The implementation makes me learn things on a whole deeper level.

The topic I picked was Business Analysis. An IT business analyst is a product manager sitting between the engineers and the customers. This person understands what needs to be built, writes requirements, and liaisons between engineering and business. Even with a day job, I curved 3–4 hours every night for my side projects. At the time, I worked as a software engineer by day and learned business analysis/project management by night. I found writing books about business analysis was the most straightforward way to capture my learnings.

On Being an Engineering Manager at Twilio

Before jumping into a management role, I was a technical program manager at Yale University. I wasn’t sure that I wanted to be a people manager or not, but I went for an engineering manager opportunity at Twilio anyway. Around 2013, Twilio has become a well-known company. This was a big transition for me for various reasons: (a) moving to the Bay Area, (b) jumping into people management, and © moving into Machine Learning from software engineering.

Initially, I joined Twilio as a manager for an infrastructure team that built tools. Then, I was on a team that built ML tooling. The flagship ML product that I worked on is Twilio Understand, an NLP product that understands text sentiment and creates structured data from text. I realized that NLP wasn’t something that I was excited about.

On Amazon Culture and Leadership Principles

I would rank Amazon as the best school for people to work for, especially on the management side. As a manager, building trust and getting the team's buy-in takes a lot of work. Amazon prefers writing memos over presenting PowerPoint slides. Putting your thoughts in a document forces you to have real details and think more deeply about the problem. Overall, Amazon has been a second college for me.

Every company has its own cultural values and puts them everywhere. Amazon makes their teams eat, drink, and eat their cultural values.

These values stay with me as I continued the management path and set the culture. I hold them dear to my heart and implement them everywhere I go.

On The Benefits of Teaching

While working on the Kindle team, I wanted to move to the Computer Vision side. I talked to Amazon leadership and pitched the idea of having a Computer Vision think tank — a team of Computer Vision experts floating around several organizations and solving their problems. The idea received a positive signal, leading to the initial step of building a team of 4–5 people.

At the time, even outside of Amazon, it was hard to build an ML engineering team. I reached out to ML university within Amazon and started a 3-month course that teaches computer vision concepts to Amazon engineers. The goal is to have about 25–30 students going through the program and get the best people who are interested in joining the new team.

Personally, teaching helps me understand exactly what I am doing. If I say model architecture A is better than model architecture B, you will have to explain why A is better than B. In front of people, you will have to know why you say what you’re saying and how your opinions can hold. Having to put my thoughts in course materials pushed me to structure my thoughts in order to discuss the topics intelligently.

On its own, teaching (or writing) is not an exercise that I enjoy (ironically). But the value of teaching is tremendous as my depth of knowledge increases, as a result.

On Building Computer Vision System at Synapse

I joined Synapse right after their seed round. We built computer vision algorithms that analyze images in the X-ray machines in airport security checkpoints and highlight bounding boxes around prohibited items (guns, knives, bottles, toothpaste, etc.). There were various challenges in the axes of hardware, software, and computer vision — coupled with a high level of security and the lowest level of available infrastructure. Our products deployed in the airports are not connected to the Internet, so we faced challenges surrounding the initial deployment, the maintenance, and the upgrades. At the time, I had to roll up my sleeves and work with hardware components. This is where what I learned from college came in handy!

It wasn’t easy to transform an MVP prototype into a manufactured product that is repeatable with the same accuracy and the same quality. Instead of having 1 or 2 products every month, we needed 20 or 30 products every week. We partnered with (1) hardware vendors to collect the necessary hardware components and (2) X-ray vendors to deploy the products/provide maintenance at the customer side.

On Data Labeling Challenges

Initially, at Synapse, we had a warehouse with 6 to 7 X-ray machines. Then, we bought the actual objects that we try to detect (knives, guns, etc.) to teach our neural network. We manually scanned these objects and collected a few thousand images. Next, we trained and deployed our model. At the moment the model got deployed, we also stored the data. Within 6 months, we had millions of images. By the time that I left, that number became 25 million.

To build our model, we needed the data to be labeled. We partnered with an offshore team — who had (secured) access to our data storage, labeled our data, and brought the labeled data back to us. I always collaborated closely with the labeling team: giving them the labels, showing them examples, teaching them how to label. This process repeats time and time. The lesson that I learned is such: while the labeling process sounds simple, the more you were working to improve your model and learn about your problem, you will change your labeling strategy and re-label your data all over again.

On Incubating Kolena

After Synapse got acquired, I was thinking about a product that could solve the ML testing challenge. I started cooking something out and drawing on the whiteboard. That’s going to be the startup that I build. When the pandemic hits, I decided to pause and got a job as the VP of Engineering for the AI Platform at Rakuten. Jumping in there, I wanted to test my hypothesis on ML testing for Rakuten’s ML initiatives. The big goal was to enable AI in the organization via process, people, and infrastructure. Within the infrastructure part, I focused on building an end-to-end ML platform.

We decided to use open-source components on top of our in-house backend infrastructure. We tested several approaches to how to do ML testing. By the end of the year, we found out that we can save up to more than 50% of the experimentation time if ML testing is done right. Testing gives ML engineers specific failure modes of the model(s) and enables them to create a roadmap to fix those bugs.

Alongside my lead engineer at Rakuten (who joined from Synapse), I decided to take a leap of faith and build Kolena — a QA platform for ML in January 2021.

On ML Testing Infrastructure

When you build a model, you always want to understand the instances where your model fails. Your test set is never going to be fully representative of the real world. In the ideal case, you build and ship linearly. But in reality, there has to be some test to make sure your product passes a certain bar. As a product builder, what is your bar to say that this product is of high quality? You need metrics.

Common evaluation metrics (accuracy, precision, recall, AUC curve, etc.) are not descriptive of the failure modes. They don’t tell you what you need to do next. You basically have to shoot in the dark because you don’t know what your model has failed on (acquiring more data, developing more complex models, buying more powerful GPUs, etc.).

There are over 200 tools in the ML tooling landscape, bucketed under 3 main categories: data management, model development, and model deployment. The most viable effort to understand how models perform in production goes to model explainability. This is not a bad solution, but now there are model testing solutions that help you understand model behavior. I think the testing category is very under-served.

After talking with hundreds of ML practitioners, I noticed that ML teams are broken down into two categories:

  1. Evaluation metrics are fine. No one pushes the engineers to test their models. I think soon enough; they will fall into the need of testing their models.
  2. The testing process is mature. In particular, they break down their test set into small, granular slices based on specific model behavior. This practice is what I adopted at Amazon, Synapse, and Rakuten.

I believe testing is a new category that has to happen eventually. A QA tooling for ML (like Kolena) should exist right after model training and right before the model gets deployed into production.

On Writing “Deep Learning For Vision Systems

In 2018, I was thinking about getting a graduate degree to study Computer Vision in more depth. When Manning reached out to me for a book writing opportunity, I figured this might be a more practical approach than going back to school. The entire experience includes 2 years of writing and 6 months of editing. That’s a humongous amount of effort, and I learned so much about the topic.

My favorite chapter is chapter 5, which talks about the evolution of Convolutional Neural Networks — from LeNet to ResNet and Inception. I want my reader to acquire the skill of reading research papers and distilling the most relevant bits. Chapter 5 is my way of doing it. I picked 6 ConvNets and shared my takeaways + implementations for each. I also discussed how one network improves a previous one. This chapter will help you get away from the nerve of reading and implementing research papers.