Datacast

Episode 114: Building Data Products and Unlocking Data Insights with Carlos Aguilar

Episode Summary

Carlos Aguilar is the Founder and CEO of Glean, a data visualization company based in New York City. He grew up in Washington, DC, where he started tinkering with robots and websites early on and fell in love with the intersection of art and technology. At Cornell, he studied Mechanical Engineering and robotics and did research in machine learning applications in robotics and art. In 2009, he joined an early robotics startup called Kiva Systems, where he got deep into data and analytics. After Kiva was acquired by Amazon, Carlos joined Flatiron Health and worked on data products to help cancer centers and cancer researchers. As the head of the Data Insights team, Carlos grew the team to 25 people who helped launch dozens of data products and supported Flatiron's core data infrastructure.

Episode Notes

Show Notes

Carlos' Contact Info

Glean's Resources

Mentioned Content

Blog Posts

People

  1. Vicki Boykis
  2. Anthony Goldbloom
  3. Wes McKinney

Book

Notes

My conversation with Carlos was recorded back in June 2022. The Glean team has had some announcements in 2023 that I recommend looking at:

  1. The recently launched, interactive public demo site
  2. This recent integration with DuckDB
  3. This post about Version Control for BI
  4. Their Public Roadmap

About the show

Datacast features long-form, in-depth conversations with practitioners and researchers in the data community to walk through their professional journeys and unpack the lessons learned along the way. I invite guests coming from a wide range of career paths — from scientists and analysts to founders and investors — to analyze the case for using data in the real world and extract their mental models (“the WHY and the HOW”) behind their pursuits. Hopefully, these conversations can serve as valuable tools for early-stage data professionals as they navigate their own careers in the exciting data universe.

Datacast is produced and edited by James Le. For inquiries about sponsoring the podcast, email khanhle.1013@gmail.com.

Subscribe by searching for Datacast wherever you get podcasts, or click one of the links below:

If you’re new, see the podcast homepage for the most recent episodes to listen to, or browse the full guest list.

Episode Transcription

Key Takeaways

Here are the highlights from my conversation with Carlos:

On His Upbringing

I grew up in an exciting time as part of the first generation to have access to the Internet. As a child, I enjoyed playing with websites, viewing their source code, and creating my own AOL web pages. I learned how to copy code and later became interested in Flash, where I discovered that I could code and draw things in the same environment, creating little games and websites.

While in high school during the early 2000s, the dot-com bubble burst, and I became interested in robots. As I entered college, I assumed that the web was interesting, but I was more interested in the way it would touch the rest of the world. I wanted to explore how technology and programming could affect the real world.

Although I took a computer science course in high school, I had already been hacking around with ActionScript and Flash, which looks a lot like JavaScript. I also wrote a lot of PHP and was fascinated with creating little webpages, hosting domains, submitting websites, and Flash applications to competitions. While not altogether useful, it was a fun way to explore coding.

On His Academic Experience at Cornell

I chose to attend Cornell because of the access to arts and humanities classes, but it turned out to be an intense engineering program, which is what I wanted. I was looking for technical rigor and definitely found it at Cornell. My favorite classes were the ones where I got to work on computers, like finite element analysis, which involved computational modeling using programs like MATLAB.

My favorite class was feedback control systems, which taught systems thinking and creating computational models for how systems work. I also took a couple of computer science classes, like evolutionary optimization and genetic algorithms, and some robotics classes.

In 2006 or 2007, I took an evolutionary computation class focused on genetic algorithms. Back then, machine learning wasn't as popular as it is today, and neural nets weren't used as much. I took the class with Hod Lipson, who had a research lab called the Creative Machines Lab. The lab explored how machines and technology could be creative, especially in the domain of fine art. We came up with a system that would take an input image and create potential solutions for creating a painting of that image. These solutions would compete with each other using a fitness function and an evolutionary process, resulting in a representation of the image that was good according to some optimization function.

The most interesting results were when we constrained the system to represent the image in a limited number of strokes, resulting in really creative solutions that even abused the limitations of the simulated environment. The project was an early exploration of how machines could be creative, and it was great to see the project continue even after I left. Hod continued working on the project for another five or ten years and created some really awesome things.

Although I haven't kept up with the machine learning applications of art, I have seen recent advances like DALL-E, which creates visual representations. It's incredible to see how far we've come in just 15 years. I've also seen a lot of digital art and generative art that combines human input and algorithms or machine learning. It's been really cool to see these things pop up.

On Working in Robotics at Kiva Systems

Maybe I can walk you through how I landed my job at Kiva after graduating from Cornell. Coming out of grad school, I really wanted to work in tech, but as a mechanical engineer, the path into tech wasn't clear to me.

I considered companies like Google, but at that point, they were all pretty big and didn't feel like I'd be able to do the creative technical work that I wanted to do. I was actually thinking about staying on as a Ph.D. with Hod in his lab.

It was really by chance that I found Kiva Systems, a company founded by a bunch of robotics folks from Cornell, including Raffaello D'Andrea, one of the co-founders of Kiva. They called me out of the blue and did a phone interview, asking if I wanted to become a systems analyst at the company.

After visiting for an in-person interview, I was sold. They had actual robots working in a warehouse, hundreds of them just roaming around. The problem they were working on was robotic warehouse automation, a collaboration between humans and robots doing different parts of the task. It was fascinating.

My first role was as a systems analyst, trying to understand how this complex system worked together. The role of a data scientist didn't really exist back then, but that's essentially what I was doing - analyzing the system and figuring out how all the pieces fit together.

The system was installed in a warehouse in central Pennsylvania, and my job was to figure out why the system was performing strangely or not as expected. I talked to all the humans interacting with the robots to uncover the reasons behind performance issues, measure the performance of the system and create tools to help manage it effectively.

It was one of the most interesting data sets I have ever worked on because of all the constraints in moving parts. There were low-level control systems doing things like path planning and resource allocation, and then there was the most complex thing: human behavior. Understanding how humans interacted with the robots and creating optimizations around that was the most interesting part of the problem.

At Kiva, we realized that data made products better. It's unclear whether Kiva would have been as successful if we couldn't have instrumented that complex machine and created tools that explained the complex dynamics to the warehouse operators.

Our customers had to figure out how to get tens of thousands of orders out over the next few days, and they had to optimize their inventory and various other aspects of their warehouse to make it happen.

On Building His First Data Product at Kiva

Actually, it was my boss who highlighted the importance of my intro blog post when I was coming out with Glean later on. However, I had been doing the same exact thing my entire career. When I joined Kiva, my first project was analyzing all of the system configurations for a really complex system. There were literally hundreds and hundreds of dials that tuned the various algorithms that managed the entire system.

My job was to export all of that configuration data, review it, and analyze where things were out of line, and how that was affecting performance in various aspects for our few customers. As soon as I started doing it, I realized that someone was definitely gonna want to do this again. This is not the last time this is gonna be useful.

So, probably the second time I had to do this analysis, I figured I should automate it and actually built a little web app that everybody could log into and see an audit at any given point of all the system configurations. This was useful when we had our few customers in the beginning, but it became even more useful when we had 20 or 30 or 40 customers, and when we added the 41st site that we were launching. It allowed us to quickly look up configurations for resource allocation seconds per station or like drive units that were the name of the robots' drive unit configurations.

Instead of doing a thing repeatedly, I learned to figure out if there were useful products inside of the organization, and it's going to have way more utility if I can just automate that thing and build it into a little product that people can really dig into on their own. This was an early lesson for me that led me to discover that a little bit of the data as a product thing really assumes self-service and trust in the people around you.

Ad hoc requests are what make the work fun. They provide an opportunity to learn something new, talk to people, and discover requirements. However, a lot of those ad hoc requests should only be done once, and you should throw them away after they're done. Use that as an opportunity for building actual software. Always have an eye out for decisions that could be optimized or automated and tools that could be built.

On His Brief Stint at Amazon

The Amazon acquisition was a lot of fun, as there was a great match in cultures between Kiva and Amazon. Both companies had very driven cultures. I remember the Amazon team descending on Boston and coming to the warehouse to figure out plans for integration. There were early whispers that they were just going to take the robots and write their own software, but it turned out that we had already solved the problem quite well at Kiva.

The first integrations into Amazon's fulfillment network were almost like customer engagements. We installed a system and integrated it with their warehouse management systems, which were unique to Amazon. They had written all their own warehouse software, and we integrated it with their technology stack. Those first integrations were intense, and we flew out to Seattle almost every week to set up these systems. The pace was incredibly fast, and the demands were high.

We had made about 5,000 or 6,000 robots when we got acquired. Over the next year and a half, the mandate was to do many times that number of robots. To this day, I don't have any particular insight into the number of robots produced, but there are hundreds of thousands of them.

I was involved in those first couple of projects right after the acquisition. Luckily, we didn't have to change the technology that much because we had done an effective job of solving it at other customer sites.

I don't know much about M&A, but this was an incredibly successful acquisition. The same technology is now automating all of Amazon's warehouses. It's interesting to see the difference between the Kiva acquisition by Amazon, where it was really a drop-in-place technology that unlocked a ton of value, and the Roche acquisition of Flatiron later in my career. The Roche acquisition was really more hands-off. Roche is a conglomerate that owns many other pharmaceutical companies, so the way it's run is really with a lot of subsidiaries.

Amazon, on the other hand, kept our culture totally intact for the first one or two months. A couple of years later, we were Amazon Robotics and no longer Kiva. They collaborated with us and eventually swallowed our culture. Going into the acquisition, they knew it was an incredibly good cultural match. Our values aligned with their leadership principles, like taking ownership, and a lot of our values were aligned from the outset. So they could acquire the whole company and incorporate it into their culture.

On Joining Flatiron Health as The First Data Hire

I was at a similar stage in my career or mindset to when I was finishing grad school, feeling excited about my work and thinking about what was next, but had no idea what that would be. I considered starting a company and talked to many startups through connections to venture capitalists, including the Google Ventures team, when they were doing a lot of early-stage investment around 2012-2013.

At that time, there were many mobile apps and check-in apps, which I found uninspiring. I even booked a one-way ticket to Bangkok, thinking I would take a break for many months. But then I met Nat and Zack, the founders of Flatiron, and their pitch for building an ecosystem around cancer data was incredibly motivating.

They wanted to partner with cancer centers to gather oncology data, which is incredibly valuable for understanding the different disease states in cancer and the drugs people are receiving. Clinical trials themselves are limited by patient populations, so they wanted to use real-world data to advance cancer care.

When I joined, we had one cancer center partner, and my role as Integration Manager was to organize and integrate data from these centers to build data products. While the pitch was broader than that, we had immediate problems around data integration and ingestion organization.

On Building the Data Insights Engineering Team From Scratch

When I first joined Flatiron, I was an integration manager tasked with moving data around and figuring out how to get it from one place to another. This was in 2013, so we couldn't use AWS because it wasn't HIPAA-compliant. Instead, we had to copy databases over using Azure, which was a difficult and manual process. However, the real challenge was figuring out how to ingest and integrate the data into our applications. Since cancer centers weren't yet using our technology, we didn't have a clear vision of what data we needed to grab or what endpoints we should be focusing on.

To address this, I got involved in the product development process, working with customers to determine their requirements and then going back to the source systems to see if it was possible to get the necessary data. This approach allowed us to establish a customer-centric data team that could close the gap between requirements and implementation. As a result, we were involved in launching almost all of the data products that Flatiron released.

On Building Data Products at Flatiron

In the early days of getting cancer centers excited about our services, we used basic business intelligence tools like population health discovery tools. These tools helped us focus on revenue cycle management, which is important to every cancer center, as they are also businesses. Cancer drugs are incredibly expensive, so auditing the billing of these drugs was a powerful tool for these small and scrappy businesses. They had to be careful because if they messed up drug billing on something that costs 10, 15, or $20,000, they could go out of business if they billed incorrectly or didn't bill correctly to insurance companies.

Later on, we focused on clinical trials and our internal data products. We realized that we had to build data assets for every single data product we created, which required going back to the source systems. We found that having an intermediate representation, like a data warehouse, made the process more efficient. Therefore, we spent time working on our central data warehouse and stopped working on user-facing products for a year to focus on the warehouse. This improved the quality of our internal data products, which unlocked future product development.

Perhaps this is why I left healthcare, but I think the biggest problems are actually just the alignment of incentives. This gets a little philosophical, but ultimately we treat medicine as a purely capitalist system. However, when your life is on the line, it's not like choosing between buying a cheeseburger at Wendy's or McDonald's—free markets don't rule. I'm willing to spend any amount of money, and the decision maker is often different from the person who's thinking about paying the bill. There are very misaligned incentives, so even when there are things that should be done for patients, like better care navigation, there isn't a model that can pay for it. Insurance companies may want to pay for it, but they need to see that it will actually have some sort of effect or that it could make the cost of care more efficient. It's incredibly frustrating to see things that you think should be done for patients, but aren't because of misaligned incentives. Patients should definitely have access to drugs and other treatments, but aligning incentives properly is probably the most challenging part of healthcare.

At that point, we didn't have Redshift, so the first versions of the data warehouse were actually in SQL Server because a lot of our source data was in SQL Server. We built a lot of our own tools and hacked together a lot of our own tools. We used a bit of Tableau for visualization. Later, we used a product called caribelle, which was an open-source solution and later became Superset. After I left, we were using Looker. We used almost all the visualization tools available, and on the storage layer, we transitioned from SQL Server to Redshift and more Postgres-oriented solutions. One of the big innovations we built was an ETL tool that we called blocks, which was very ergonomic for SQL-oriented folks. It was accessible to people who only knew SQL, so even Biostatisticians could use this tool to a certain degree. We built incredibly complex DAGs of data pipelines based on our internal data warehouse, which was a big unlock. If it were today, we would probably use something like DBT, but the tool that we invented was, at the time, a big improvement. We did a lot of building ourselves because a lot of the tools on the market were fairly immature.

On Hiring Data Talent

It was challenging for us at Flatiron because we had a systems-oriented approach to data. Therefore, we prioritized hiring for product skills, rather than building a product peer case study. We presented candidates with a data set and a customer and asked them to create a case study of how they would serve that customer. This approach helped us gain insight into how to build for data at a small organization, where prioritizing what to build can be difficult. Unlike software development, data teams are often left to figure out how to prioritize things and discover use cases on their own.

We had a minimum bar for every hire, and each candidate needed to demonstrate use-case thinking, product orientation, and customer empathy. We also looked for a spike in technical areas such as statistics, coding, or machine learning.

Flatiron was a matrixed organization with different product initiatives that worked cross-functionally. Each initiative had a cross-functional team, including software engineers, biostatisticians, data insights engineers, product managers, and designers. The team with the most data insights folks was probably the central data warehouse team, which dealt with a lot of data organization.

Each product line justified the ROI of our data folks, which made it easy to justify our headcount. Data insights people helped move the product forward, so teams were always asking for them. The challenge was to develop norms and still feel like a team. We had a weekly meeting where we did cross-functional learning, which helped us see trends across the entire organization. This became another superpower of the data insights team, as we were able to connect the dots and find areas of collaboration and learning across the whole organization.

On Founding Glean

I've always had an idea in the back of my head regarding data visualization and reporting tools. While there are many visualization tools available, I've never found a go-to product that was the last one in the category. There seemed to be something missing from my tool set, and I have encountered various obstacles while trying to empower people with data at Flatiron.

I remember specific instances where I gave data sets to operations personnel, only to have them create three swanky charts or three pie charts without sharing them with others. At that point, I realized the importance of coaching people on how to make data visualization accessible to others. Building a data visualization is like building a data product that needs to be consumed by someone else, so it's crucial to make it understandable to everyone.

Although there are common patterns for approaching data visualization, the user interfaces of these visualization tools are always the same and not very ergonomic. After leaving Flatiron and doing some consulting work, I began prototyping my own data visualization tool, Glean. I created a prototype, got excited about it, and showed it to a few people, which motivated me to start the company.

Glean is an ambitious product in a crowded and competitive space, so I knew it would take time to develop. I started off on my own, writing React and JavaScript, and thinking through the core concepts that needed to be included. After slowly hiring a team and getting some customers, we spent over a year building and testing the product with actual customers before doing a broader announcement.

The term "glean" means to pick up the morsels after harvesting a crop. In modern terms, it means to discover something from the data. That's how I see data visualization: starting with a heap of data, organizing it, and building on layers of meaning until people can dig through it and find their own insights.

On Data Visualization and Exploration

During my time at Flatiron, I coached people on how to analyze and visualize data and how to approach new datasets. Every tool accessible to non-data team members has a similar user experience: start by selecting a dataset and dragging columns around while experimenting with different visualization types. It's a combinatorial problem: With 20 or 30 columns, and a few different chart parameters, there are hundreds of millions or billions of possible charts that you could conceive of in the first five minutes. However, probably only 0.001% of those charts are worth starting with. Every time I coached someone on visualization and analytics, I started with a time series profiling of the data, to get into the data.

Data tools are designed for a user experience, and Tableau, the most accessible tool, offers amazing visualization capabilities but is still hard to get started with. So, what Glean offers is automatic visualization and profiling at the outset. This isn't some esoteric approach; it's just a guided workflow that starts with exploratory data analysis, looking at trends over time and showing you a ton of those things right out of the box. The workflow defines some metrics declaratively and puts you in a very visual, very interactive explorer, allowing people who are somewhat familiar with data but not visualization experts to start clicking into data.

Analytics is a scale in and of itself that is separate from coding and technical skills, and creating a guided workflow that teaches people about good visualization is essential. Glean walks you through the process, whether or not you have coding skills or expertise in data visualization. The core value proposition of Glean is its strong defaults and automatic visualization, which makes it easier to share insights with your team.

It's hard to explain what Glean's automatic visualization looks like and how intuitive it is, so watching the demo video will help you understand it better.

On DataOps For BI

So far, a lot of what I've talked about is clicking around and finding insights, which is the fun part. I think Glean really makes it enjoyable. However, the challenge with dealing with a scaled organization is that sometimes those one-off little dashboards and analytics that you thought were temporary become incredibly important and mission-critical. All of a sudden, everyone cares about this one dashboard, and a few things happen as a result.

Firstly, as your organization grows, the upstream dependencies from this dashboard are likely growing as well. There are data pipelines that need to be managed before the dashboard actually gets materialized and before the chief revenue officer sees it. So, the quality of the data matters and can cause a lot of complexity upstream.

Secondly, you may want to iterate faster on this dashboard now that you have a new revenue line to incorporate. This can lead to additional development requirements for these dashboards that emerge. Unfortunately, change management in modern data tools is terrible.

At Flatiron, we managed this by trying to sync and coordinate changes to pipelines and downstream dashboards simultaneously. We tried to come up with staging environments, but there just weren't really good workflows for them. Sometimes we would just change things and wait for downstream things to break, and then we would fix them.

The idea behind DataOps inside of Glean is to give you this freewheeling experimentation. Everybody clicks around in the data. We have customers that just use Glean in that mode that are early-stage and don't need to worry about DataOps or anything.

However, sometimes dashboards get shared with customers, and they become production products that need to be checked in. Glean DataOps has a few different components. Every resource in Glean can be exported as configuration files and managed under configuration. We also have a build tool that allows you to build resources in Glean in continuous integration. We have a CLI that allows you to build these resources.

The most important feature is a feature we call previews, which allows you to see an alternate view of your entire analytics stack with a proposed set of changes. This accelerates teams because you can propose changes in a poll, see the entire environment in this sort of duplicated state, and show it to your chief revenue officer. They can play around with it for a week or whatever, then merge it in and deploy it into production. This is one of the more sophisticated workflows for analytics development when those dashboards become production products, and you want to maintain their quality.

On Product Vision

I believe that products are ultimately about people, and it is important to understand the different personas that interact with them. However, the challenge with data products, such as reporting and business intelligence, is that you have diverse stakeholders with different needs. These include executives who just want to check numbers, analytical individuals who want to dig deeper into the data, and platform engineers who maintain the systems.

Our approach to treating the platform as a product and data as a product within a single organization is focused on creating incredible tools for each of these personas. To achieve this, we have initiatives coming up to help different personas collaborate and have an amazing experience.

Recently, we launched a workbench that looks more like a SQL IDE, which is great for technical analytics engineers and other technical staff. We are now focused on improving our charting and visualization library, which intentionally only has a handful of chart types that are highly configurable and rearrangeable. This is to make it easier for users who are just trying to see some data, instead of overwhelming them with 50 or 60 chart types. We are trying to make a core set of chart types, such as Cartesian charts and pivot tables, incredibly useful for organizations.

Our visualization library is a big area of focus for us over the next few months. We aim to make it more configurable by adding more complex tables, calculations, and trellising. This way, we can teach people how to do more complex visualizations in a safe way.

In addition, we are working on better collaboration within Glean, particularly for larger teams using the product. This means better inline documentation, commenting, and other features to enhance collaboration.

On Hiring

Hiring is all about culture, which is the best mechanism you have to influence. Finding the right people who appreciate your culture and values is crucial. Thus, interweaving your values and culture into your hiring process is incredibly critical.

When thinking about the culture to build at Glean, the data insights team at Flatiron provides a great model. They had a high energy level, were super innovative, and were willing to share and disagree with each other in open collaboration. Feeling safe enough to throw out ideas was essential to their success.

At Glean, we want to build a similar culture. We need to find people who are excited about taking ownership, being in the driver's seat, and having a collaborative spirit. We want a diverse and eclectic team that is focused on technical excellence, innovation, collaboration, and ownership. Candidates see when you take these things seriously, and when you have an organized set of requirements.

Hiring is a two-way street. We need to make a value proposition for candidates, showing them how their careers can unfold inside our organization. We need to explain our product and values and make sure candidates are excited to take the next step with Glean.

At Flatiron, we worked with a recruiter to source diverse candidates. We also held and engaged in events to find candidates. We focused on reaching out to diverse folks at the top of our hiring funnel and making sure there was good representation throughout our entire hiring process.

On Finding Design Partners

It has been particularly difficult in the business intelligence sector, even though there are many stable options available. For me, networking and pitching ideas to people were key. This allowed me to identify pain points and find receptive areas in the market. Sales in a competitive market can be challenging, and it takes persistence, conviction, and repetition to find early believers and adopters.

I am naturally stubborn, which helped me to develop an immense amount of conviction around my idea over the course of 10 years. However, this is not a repeatable set of advice. Instead, it is important to have conviction in your idea, backed by market evidence. Tenacity can come from various sources, such as upbringing or personal qualities.

On Fundraising

It's probably a challenging fundraising environment right now.

When it comes to investors, I see them like employees. They won't have as much impact as employees. Money is obviously useful, but finding investors is like finding people who want to join your journey. It's not that different from finding and recruiting your first customers or employees. You have to carve out a path for them too. You're helping your investors do something too. They have motivations and are motivated as well. So, there isn't a silver bullet for finding investment.

Most of it is just preparation. Don't spend all your time looking for investment. Instead, spend your time preparing, figuring out the market, having an amazing story, building proof points of that story, and having that in your pocket when talking to investors.