Datacast

Episode 62: Leading Organizations Through Analytics Transformations with Gordon Wong

Episode Summary

As a data modeling fanatic, data warehouse architect, multi-hypergrowth startup veteran and team builder, Gordon has built his career on helping people get their business questions. Over time, he's switched his focus from pure technology to complete solutions where people, processes, and technology all play a role. At Fitbit, he established the data warehousing team and, as an early customer of Snowflake, used it to fuel petabyte-scale analytics. Later on at both ezCater and Hubspot, he rebuilt the data warehousing teams to focus on enabling analysts, not loading more data. A constant focus on the customer and their problems has led him to realize that empathy is the most important trait a leader can have.

Episode Notes

Show Notes

Gordon’s Contact Info

Mentioned Content

People

Book

Episode Transcription

Here are highlights from my conversation with Gordon:

On Getting Into Database

Frankly, I got fortunate. I had degrees in Psychology and Philosophy and not a lot of jobs required those degrees back then. I ended up getting a job at a Geographical Information Systems (GIS) lab with a modest beginning — spending the first year doing data entry. This job was very repetitive, and I wanted to go faster. One day, I borrowed the DBA password and taught myself SQL to improve my entry rate. Learning SQL snowballed my career from there.

There are practical and emotional challenges dealing with impostor syndrome. With liberal arts degrees in a field that expects you to be technical, you can suffer the delusion that you do not know enough. For many years, I had the intention to go back to school and get that technical degree. But at some point, I realized that the technology was not the thing that holds me back, but more so how I apply the technology.

On Consulting

Consulting seemed challenging to me: how do I learn in a new environment, define a problem, and be successful in a short period of time?

AB Initio Software was a self-funded company with a great customer base and some of the smartest/talented people I have met. They had a maniacal focus on customer success and solving problems at the fundamental level. These are two things that appeal to me.

My core skill is building solutions that help analysts thrive inside and drive better decisions.

On Data Warehousing

Smarter Travel Media is one of the earlier companies that get into travel search. They had an easy-to-use portal, particularly catered to non-technical people, where you could enter your criteria for finding flights/cars/hotels and get back results. While the business was about attention arbitraging, the customer value was about helping them travel easily. As you can imagine, there were a lot of data behind that product, and I started learning about clicks, impressions, actions, etc.

This was my first opportunity to build a department from the bottom up. I got recruited by the founders of the company. They were not doing any deliberate analytics (all via organic spreadsheets).

From a technical perspective, I got the opportunity to dive into Microsoft’s SQL Server stack:

Being a one-person shop building the whole thing from top to bottom taught me not just how to build the data warehouse but also support it in production and engage in continuous delivery. It’s not enough to just populate databases and write reports. Those reports have to be accurate, reliable, and usable. In fact, I built my own data quality system to test the data based on a simple binary test. I went to users directly to onboard them and made sure they successfully used our dashboards and reports.

On Columnar Databases

After Smarter Travel, I got recruited by an old boss from an earlier company to join ClickSquared and build a multi-tenant campaign management platform. This was the first time I got to see columnar databases and their advantages over traditional databases. I also wrestled with the problem of dealing with multiple tenants/clients in a single warehousing instance.

Previous to the columnar world, relational database stores data in rows and writes them into disks next to each other. In the analytics world, we frequently do not want details but insights. Columnar database solves the problem of only bringing back the data needed to answer the input queries. They were successful because they were very efficient against optimizing constraints and led to 100x speed improvement. At the end of the day, there is still compute, memory, and storage. It’s just how we organize the data to optimize for certain problems and remove the constraints.

On Choosing Data Tooling Vendors

This is an educated guess, but I always go back to the fundamentals: what outcomes that I try to drive? Over what time frame? And what resources are required (or will hold me back)?

At Fitbit, we had the challenges of (1) understanding our customer actions to improve the product, (2) understanding our potential prospects to add more customers, (3) understanding the devices themselves to improve the product from that direction. Given those questions, I understood that my users were in marketing, firmware engineers, and product groups. Working backward, I thought through how their questions could be formulated, what data they need, and how the data can be processed.

I realized fairly early that we would be constrained by hardware as we were struggling to write even basic SQL queries. Redshift (our warehouse at the time) was not able to scale in a cost-efficient manner. As luck would have it, a little startup called Snowflake came along and got my attention when they started talking about separating compute and storage. Fast forwarding, I was working with a great sales team over there and became impressed with the product the first time that I used it. Two of the founders came from Oracle and had developed a product that I used before, so there was an instant connection there.

I believe that Fitbit was Snowflake’s biggest customer in 2016 (as my friends in the sales department told me). We put a petabyte of data in there, which seems big even now. I had one table with a trillion rows. I had to encourage my team to start using scientific notation when talking about the size of the databases.

I look for vendors interested in solving my pains and addressing the constraints that my customers have in terms of taking action.

On Being Data-Driven

It’s never too early to start on analytics and measurement. Real improvements come from introspection — recognizing where you are and where you want to go. If I have a small startup, I need to know what my customers are doing and how they are using my product to improve it from an empirical fashion.

Being data-driven, funny enough, is very threatening. To be data-driven, you are making a commitment to being empirical, logical, and scientific. This means that you don’t get to guess as much, express yourself, and be the hero.

My recommendation is to commit to building a culture where you are constantly curious about your product and yourself to improve, as you will naturally seek data. Start small, think about the business problems, figure out the most important question, and answer that question to identify the constraints. Don’t launch yourself into a 1-year project to build a big data solution. You just don’t know what you need yet.

On Team Collaboration

At Fitbit, I managed the 3 teams of data engineering, data warehousing, and data analytics at once. I needed to figure out how these teams could work together. If you studied agile development, you are probably familiar with feature teams and component teams:

If my 3 teams are highly specialized, I will have a problem with communication, queuing, and the distance between engineering and business problems. The visibility between component teams and the business problems is really thin. Instead, I focus on building feature teams to put together multiple people with different skill sets and communication styles to solve user problems. I realized that I had to make things simple. That means giving these teams stability in vision and mission, as well as reasonable deadlines to move the needle forward.

Because I did not have a product management tea, I created this notion of “fractional product managers” — engineers who volunteer to understand specific department needs and become their advocates. For those people who engage with the users and solve their problems, their careers benefit from that.

On Interviewing Data Engineers

Data engineers gather the data and get them into the database. They need to interact with a variety of upstream sources programmatically. Thus, they need traditional software engineering skills to accomplish that. Furthermore, excellent engineers are professional about creating pull requests, code commits, and code reviews. In the database world, that was missing for a long time because a lot of practitioners were not traditional software engineers. Data engineers need to build solutions that are reliable, consistent, and maintainable.

One way that I interviewed data engineers sometimes is to take a piece of paper and draw a line in the middle. The idea is on one side, and the implementation is on the other side. I want to know about their ideas as well as their ability to implement them. I did not do the trivia hunt through resumes.

On “Data Hierarchy of Needs”

I came up with this notion to better explain to my customers how to evaluate the data analytics maturity curve. To define maturity, we need to define the fundamentals mentioned here. Maslow’s hierarchy of needs heavily influences this. If we try to build a predictive model or a mature dashboard, what are the fundamental things that enable such products?

  1. The first requirement for any company working with data should be protecting their customers. Security has to be your first priority. By protecting your customers, you protect yourself.
  2. The second pillar is data quality. How do I keep noise out of the signal? How do I protect myself from bad decisions? Real sustained growth and velocity come from avoiding disasters as much as going faster. Data quality is an investment in terms of risk mitigation by protecting yourself from bad decisions. I encouraged my teams to lean into test-driven development so that we can test for data quality right in the beginning.
  3. The third pillar is reliability. Any good solution has to be dependable. This is DataOps/DevOps nuts-and-bolts: defining SLAs, understanding user needs, and measuring how well you perform from a reliability perspective.
  4. The fourth pillar is usability. I believe that engaging in analytics is a creative enterprise. It’s about asking questions and using the answers (to those questions) to either take action or ask better questions. Whenever you engage in a creative endeavor, if you struggle with your tools, creativity is constrained. If the analysts struggle with formulating queries or defining objects, they won’t be able to come up with insightful questions. User experience becomes critical.
  5. The last pillar is coverage. Do we have the information that describes the event we try to understand? If you don’t resolve the first four, there is no point in giving people data. You also want to keep the scale and scope under control. Constrain your scope, target the most valuable question to answer, and work backward from there.

On Data For Social Impact

We have a technology-focused culture in our society, where we constantly try to go faster and climb higher. Sometimes we forget people and how to drive our solutions sideway. If you don’t have the information to make an informed choice, you really don’t have democracy. So how can we use data to improve society and drive democracy?

By making information more accessible and driving insights in social areas at the local level, we can help people understand each other better, drive our empathy, and improve the lives of everyone. That sounds idealistic in some sense but actually is pragmatic. Think about the outcomes that you want. Think about what matters to you. Think about what is constraining you. And then find the answers to take better actions.

I’d like to see more efforts in enabling not just data scientists and experiment analysts but ordinary people to answer questions that matter to them and make better decisions for themselves.

On Team Dynamics

Empathy, honesty, and optimism are the three most important traits for a manager. I definitely embrace servant leadership. I started with the mission of how to help my employees be happy and successful in their roles. And I can’t do that without those three traits.

  1. Empathy: I need to care about their careers and understand challenges from their perspectives.
  2. Honesty: I need to be able to give honest feedback and give the information they need to improve their performance.
  3. Optimism: I have to believe that people can do better if they fall short of a goal.

Moreover, I think the team members have to have empathy, honesty, and optimism for each other. Clearly, cooperative teams are more performant than non-cooperative teams. Being cooperative comes from being empathetic.

On Snowflake

  1. If you know SQL, you can use Snowflake. This is fantastic because SQL is a familiar paradigm for most people in the data world.
  2. Snowflake scales according to your constraints. More specifically, your compute and storage can scale on-demand independently. Users pay for utilization, so they don’t have to pre-buy a huge amount of capacity and anticipate what their customers might do in the future.

From a compute perspective, Snowflake is great at parallelizing queries. Its performance tends to scale linearly with the size of the data. As engineers, we use to be frugal and parsimonious with compute resources. With Snowflake, we can bring more resources to bear to solve the problem that we have right now essentially for free.

On The ETL Tooling Landscape

I think these tools will become more horizontal. At the moment, there are tools that focus on specific components of the data stack, such as data ingestion, data transformation, dashboard creation, etc. As we can solve these problems well enough, we will go into a higher level of abstraction. Remember that the objective is to deliver insights by trials and actions. I believe practitioners will lean in DataOps and analytics engineering. We might even see “InsightOps” or “DecisionOps” to identify the constraints of delivering insights.

Getting data from different sources and bringing them into a data warehouse will be commoditized. Data integration vendors need to make data better known and understandable by mapping out the data terrains, creating data typography, and building a stable data foundation. Let’s start having mature conversations about the end-to-end knowledge graph within organizations.