Datacast

Episode 59: Bridging The Gap Between Data and Models with Willem Pienaar

Episode Summary

Willem Pienaar is the Engineering Lead at Tecton and the creator of Feast, a feature store for machine learning. Feast is an open-source project that Willem developed while leading the machine learning platform team at Gojek, the Indonesian ride-hailing startup. In a previous life, Willem founded and sold a networking startup and worked on industrial data systems.

Episode Notes

Show Notes

Willem’s Contact Info

Mentioned Content

Feast

Article

Talks

People

Book

Willem will be a speaker at Tecton’s apply() virtual conference (April 21-22, 2021) for data and ML teams to discuss the practical data engineering challenges faced when building ML for the real world. Participants will share best practice development patterns, tools of choice, and emerging architectures they use to successfully build and manage production ML applications. Everything is on the table from managing labeling pipelines, to transforming features in real-time, and serving at scale. Register for free now: https://www.applyconf.com/!

Episode Transcription

Key Takeaways

Here are highlights from my conversation with Willem:

On Selling a Networking Startup in College

I had to put myself through the university. We started a company out of my dorm room, where we resold wireless Internet. On our student campus, it was expensive to get Internet. If you had an ADSL line, you could resell that and shared lines with other people. I put a Wi-Fi router on the rooftop of my room and started to resell it. After 5+ years, it grew to the whole city. We had towers up on the mountain. We had employees. We had contracts with buildings and businesses and hundreds of paying customers. It’s somewhat crazy how a need for me to get through university and generate some income could grow to that business. Once I finished my degree, I sold the business to our competitor.

Don’t do a full-time engineering degree and running a full-time business that requires 24/7 attention. That period was the toughest time of my life. At one time, lightning struck and collapsed one of our towers at 2 AM, leading to customers reaching out to fix the problem. And I had an exam the next day. Those are not good combinations.

I learned that I am capable of a lot. If there is a necessity, I will find a way to do it. This experience required me to get out of my comfort zone. As an 18-year-old college student, going up to management consulting corporates and convincing them to commit to a contract was quite scary. I did not have an understanding of any contractual or legal frameworks at the time. But it happened anyway because we were solving a real problem for people.

On Building Industrial Data Systems

Most industrial companies have existing legacy machinery that needs to be digitized. It’s always the integration points that are the problems — lack of documentation, no connectivity, etc. Given a 30-year dated machine, you suddenly need to turn it into something that produces modern data points.

There were hundreds of little challenges I had to overcome to complete any project. And these projects were bounded by contractual amount, else the money invested would be lost. Most of the challenges were actually common to consulting, which are about scoping and setting expectations more than technical challenges.

On Joining Gojek

In 2017, Gojek was a rocket ship that was taking off. The name stands for “motorcycle taxi” in Indonesian. The company provides a single app that fulfills all the customers' workday needs (getting food, making purchases, ride-hailing, etc.). They built a super-app that eventually consists of 16–17 services and products (digital payments, logistics networks, careers, e-commerce, groceries, lifestyle services).

At the time of my joining, Gojek had a core product but no data foundation nor machine learning applications. They knew that they are sitting on a mountain of the best data for millions of Southeast Asians. They had a team of data scientists who could not get anything into production, so they decided to hire an engineering team to help these data scientists getting something into production.

My team’s initial entry point was building ML systems and getting an uplift in core metrics/reducing fraud activities.

On Designing Gojek’s ML Platform

Our ML platform team focuses on the end-to-end ML lifecycle from idea to production. Our platform was designed to be self-service, so that data scientists could go from nothing to something without involving an engineer.

On Scaling Gojek’s ML Platform

Firstly, a big problem that our team had when getting started at Gojek was that we did not have a data foundation. If we want to do ML operations at scale, we would need a proper data foundation. Unfortunately, a lot of data scientists are being forced to do that data engineering work today. Hopefully, in the future, this will become a solved problem for most companies. It’s important to have a unified data foundation across different event logs and historical environments so that the data come to you.

Secondly, your features should be free. You should not struggle with creating and publishing your features. Ideally, you should reuse features created by other teams.

Thirdly, avoid breaking abstraction is a key lesson for Gojek, as we have an API-first approach. An important thing that is super over-looked in the industry where I always hammer is that there are different lifecycles in developing different parts of the ML system. The biggest mistake that many teams make (especially during the scaling phase) is not to break up their system into smaller components. They should modularize their workflow into smaller stages such as data engineering, feature stores, model development, model deployment, model monitoring, etc.

Finally, you should pre-compute everything. Many people think it’s cool to have a real-time system that is always updated with on-the-fly retraining. But it’s much better to do everything in batches (whether serving data or serving models). This is easier to track, fallback, measure, and reason about.

On Feast’s Inception

Looking holistically at Gojek’s internal ML lifecycle, we realized that we spent a lot of time on engineering features and getting features into production. Data scientists also duplicated their code and did not version/lineage their work. We knew that Uber’s Michelangelo team has built something of this kind and solved this problem, so we wanted to build something similar to solve our own problem.

We collaborated with the engineering team at Google Cloud on developing Feast. Here are the core problems at the time:

  1. Features are not being reused.
  2. The definitions of features vary.
  3. It’s hard to serve up-to-date features.
  4. There’s a consistency between training and serving.

A feature store is meant to bridge the gap between offline development and online production. A feature store helps data scientists publish/connect data into the operational side. A feature store also helps the data consumers train and serve models on that production-quality data. A feature store sits in the interesting boundary between ML and data.

Those are the problems that we want to address with Feast. When launched, Feast addressed all the 4 problems above. The one that we did not fully address was the feature reuse one. We thought that when people publish features to Feast, they will start reusing them. It turned out that there was still a large trust factor there. If you do not make it super clear exactly what you are consuming and using in your models, data scientists tend to publish their own data or fork upstream code and publish them into Feast. We had a good penetration of feature reuse, and Feast was successful at that, but it was not as high as we have thought.

Another problem that we started to solve with Feast is ensuring the data quality. At v0.7, users can validate training data and production data in real-time and batch deployment. This is a big problem in the current operations of the ML system.

On Feast’s Product Roadmap

Prioritizing a product roadmap can be tricky because everyone wants something different. We pushed out a lot of functionalities that people wanted, but we needed to figure out whether our project vision solves problems for a specific group that we were targeting at the end of the day.

When started, we focused on solving feature-as-a-service for ML platform team like ours. We frequently surveyed our users outside of Gojek to understand their pain points. However, we were more often informed by our internal fires that we were fighting at Gojek, so we were more biased towards our internal users.

As time went on and things stabilized internally, we democratically looked at both external and internal users' needs and prioritized those that could be the most impactful.

The most important thing that we did to grow the Feast community is to have RFCs (request for comments). We designed specific functionalities, shared them with the community, took their feedback, and responded to GitHub issues.

On The Future of Feast

Here are the key lessons that I learned:

Here are what Feast is moving towards:

All in all, we will make sure that Feast is a lightweight system. Right now, we are even making Feast completely runnable from a notebook environment. You can expect to see a release related to this lightweight mode of running Feast in the summer.

On Commercial and Open-Source Software

Commercial software is targeted at people who have large problems and are willing to put money towards solving them. It literally takes years for an engineering team to build an in-house feature store at scale. These are the kinds of problems that Tecton is addressing. The requirements can vary:

With Feast, our users are typically small data science and platform teams. As the complexity goes up and the stakes are higher, Tecton is an obvious choice. Tecton’s product is very far ahead of anything I have seen in the space.

On Living and Working In Southeast Asia

The South African experience is very sub-urban with car culture. The Southeast Asian experience is much more confined with tighter space. At the same time, Southeast Asia is more diverse culturally, especially in Singapore and Thailand. It was a blast working there. I wouldn’t mind going back at some point in my life.

Overall, it’s extremely rare for a Feast-like system to be built in Southeast Asia. Most Southeast Asian companies focus on implementing solutions, not on building products. The competence is there, but the companies are not run in that fashion.