Datacast

Episode 61: Meta Reinforcement Learning with Louis Kirsch

Episode Summary

Louis Kirsch is a third-year Ph.D. student at the Swiss AI Lab IDSIA, advised by Prof. Jürgen Schmidhuber. He received his B.Sc. in IT-Systems-Engineering from Hasso-Plattner-Institute (1st rank) and his Master of Research in Computational Statistics and Machine Learning from University College London (1st rank). His research focus is on meta-learning algorithms for reinforcement learning, specifically meta-learning algorithms that are general-purpose, introduced by his work on MetaGenRL. Louis has organized the BeTR-RL workshop at ICLR 2020, was an invited speaker at Meta Learn NeurIPS 2020, and won several GPU compute awards for the Swiss national supercomputer Piz Daint. 

Episode Notes

Show Notes

Louis’s Contact Info

Mentioned Content

Papers and Reports

Blog Posts

People

Book

Episode Transcription

Key Takeaways

Here are highlights from my conversation with Louis:

On Being A Self-Taught Programmer

I started programming when I was about 10. I was super fascinated by the idea of telling a computer what to do instead of having to do it myself. This idea of magnifying my own effort as much as possible has been a recurring theme in my life.

When I was 13, I wrote on my small computer a game with the XNA game engine. It took me an entire year to get it finished because I was learning everything on the fly. On my 14th birthday, I burned up CDs for my friends, which contain my game. That was an awesome moment.

A central theme in my life is being a self-taught programmer. Whenever I want to achieve something, I just take it into my own hands. That mindset manifests itself in the form of having the dream to create something big in the future. When I turned 18, I started my own small company with a friend. We started with freelancing. That was a dream come true for me. I had the opportunity to write real-world software, deal with clients, learn about finances and taxes.

The engineering mindset from that time still stuck with me in some sense: striking the right balance between making things work in practice and achieving the bigger vision.

On Getting A Bachelor at Hasso Plattner Institute

HPI is a privately financed institution by Hasso Plattner, the co-founder of the big giant Germany software company SAP. It’s a small institute with just 500 people, including faculty and students. My greatest friendships formed during that time.

The Theoretical Computer Science courses are those I enjoyed the most. I already had a bit of a software engineering background when I started, while there are so many topics in theoretical CS that I was not familiar with yet. Additionally, the Introductory Machine Learning course ultimately inspired me to follow that direction.

My Bachelor thesis is the first AI research that I engaged with. The more I learn about Machine Learning, the more I was bothered about the fact that the learning part of ML is, in some sense, limited. For example, an image classifier can learn from data, but the architecture needs to be designed manually. This inspired me to perform an architecture search for a Convolutional classifier. I came up with a differentiable variant of architecture search by growing and shrinking the number of neurons and layers in this ConvNet.

While taking a deep learning class, I worked on a big project that implements speech recognition with a constrained GPU memory budget and limited training data. My ACL paper investigates how one can train a speech recognizer on a large English corpus and apply transfer learning. That essentially means taking weight initialization and continuing the training step on a smaller German corpus. Ultimately, that led to fairly large savings on GPU compute memory and required training data.

On Getting A Master at the University College London

Going to UCL was one of the best decisions that I made. It threw me in this ML environment, and I have never learned so much in a single year. I transitioned from a software engineer to an ML researcher.

The best UCL course I took is “Probabilistic and Unsupervised Learning” at the Gatsby Unit, which was insanely and densely packed. It allows me to learn a lot from the professors/peers and get up to speed quickly.

On Modular Networks

If you look at the history of deep learning, it’s not necessarily the fancy new learning algorithm that ultimately improves a model’s performance. Instead, a driving factor is how you scale the dataset size and the number of model parameters. If you think about the human brain, it has an estimated 150 trillion synapses, as a rough approximation that corresponds to at least 150 trillion floating-point parameters. Existing deep learning models have maybe up to a few hundred of billions of parameters. There are still a few orders of magnitude in between.

The main issue is that we need to evaluate all the parameters for every input that we feed into our model, which means our compute budget must scale linearly with our model size. However, not all neurons in the brain fire all the time, and the energy cost is proportional to the number of firing neurons. That’s the inspiration for my modular network paper.

Our follow-up report rethought our approach: maybe it would be better to simply turn on and off certain neurons in our network. The main insight is that perhaps we can use sparsity (either in the weights or in the activations of the network). The sparsity in the activation is conditional to the input, while the sparsity in the weights is unconditional. Whenever a weight or an activation is zero, we would skip the associated computation. In the case of the activation, we would skip the entire row of the matrix. But there is an unfortunate problem here: our GPUs are absolutely awful at skipping those zero-weight matrices. Ultimately, the long-term goal is to develop hardware that is capable of leveraging such sparsity. The near-term goal would be something equivalent to modular networks, as suggested above.

On Characteristics of Machine Learning Research with Impact

I think any researchers interested in making an impact should try to learn as much as possible about why other people are successful. Answering the question of what constitutes an impact is quite difficult. In academia, we like to use citations because they are one of the few things that we can directly measure and compare. In this report, I analyzed what kinds of papers are highly cited and when do they get cited.

On Pursuing a Ph.D. at the Swiss AI Lab IDSIA

When looking for a Ph.D. opportunity, I was already interested in meta-learning and figuring out the best way to pursue AGI. Jürgen Schmidhuber is a very interesting advisor to work with. His research interests align with mine. He also started the field of meta-learning in 1987. It’s incredible how many of Jürgen’s ideas find practical applications today or are reinvented later.

In terms of the research environment, we have a lot of freedom in the group and pursue promising projects. That’s something I appreciate and has been working quite well for me so far. The lab is located in Lugano, the Southern part of Switzerland. I love hiking in the nearby mountains and going for walks at the lake while reflecting on life.

On Meta Reinforcement Learning

There is a lot of trial-and-error involved in doing research on learning algorithms. The research community keeps inventing new reinforcement learning algorithms to solve all the problems we have yet solved. Some people called this “graduate student descent” as an analogy to gradient descent, where we need a lot of human capital to improve our learning algorithms. With meta-learning, the burden of designing a good learning algorithm is no longer necessary on the human researcher but shifts to learning data automatically.

To deepen my understanding of problems that meta-learning would have to solve, I created a big mind map with all the challenges that I was currently thinking of in reinforcement learning and categorized them on whether they are solvable by meta-learning.

On The Path to AGI

Even if we have a perfect meta-learner, we would still need an environment to train it on and some tasks for it to perform. Ultimately, building these environments and tasks by hand would be infeasible. We need some principles of generating a new problem apart from the manually designed tasks that we want the AI to achieve. Essentially, these two pillars of environments that we have to generate and the meta-learner to be trained later became quite related to Jürgen’s Power Play framework. A few months later, Jeff Clune published his idea on the AI-Generating Algorithms, which describes a similar angle of using meta-learning as the path to AGI.

On MetaGenRL

When starting this project, I was bothered by the state of meta-learning research at the moment. I always want meta-learning to be the change that reduces the burden for human researchers to invent new learning algorithms. However, it seemed that existing meta-learning approaches did not solve the same problems that a human researcher would. Most SOTA meta-learning approaches could only adapt to extremely similar tasks/environments, not so much over a wide range of environments (as a human-engineered algorithm can).

MetaGenRL (Meta-Learning General Reinforcement-Learner) is our first step towards the direction of creating meta-learning general-purpose algorithms.

We were able to show that this meta-learned objective can be trained in an environment and applied to another entirely different environment later on while still performing the learning there. In other words, it can generalize to significantly different environments. For me, this was a breakthrough, and I was delighted with the results.

On Variable Shared Meta-Learning

Variable Shared Meta-Learning (VSML) is the next step after MetaGenRL. With MetaGenRL, we were able to learn somewhat general-purpose learning algorithms. However, we still need many inductive biases (like back-propagation and objective functions) that we hard-coded in our system. That means we could make sub-optimal choices there. VSML disregards these human-made design choices.

My intuition is that learning algorithms (such as back-propagation) are simple principles that apply across all neural networks. We have only a few bits that describe the learning algorithm, but lots of bits are the results of what’s being learned. Usually, a learning algorithm extracts the information from the environment and updates the weights of the neural networks with it.

The simplest meta-learner is the recurrent neural network (RNN) that receives feedback from the environment. In the RL context, the RNN outputs some actions, receives rewards about how good those actions were, and decides what actions to take next based on that feedback. In some sense, we can encode a learning algorithm in the weights of the RNN. We can also store information about what could be a better strategy in the future in the RNN’s activations. However, we will have quadratically many weights compared to the activations — meaning we have a largely over-parametrized learning algorithm with way too many variables and very small memory.

For a general-purpose learning algorithm, we want the opposite of that. My simple solution with VSML is to introduce weight-sharing to the RNN’s weight matrix, such that only a few weights are replicated. The cool experiments we did in this paper were two-fold:

  1. We showed that an RNN using this variable-shared approach could implement back-propagation. We did something called learning algorithm cloning — where we trained this RNN to implement back-prop. When we ran this RNN forward, it became better at predicting labels.
  2. We also attempted to meta-learn from scratch: Here’s all the data and the labels, figure out how to better predict these labels just by running the RNN forward. Our trained RNN performed well in unseen datasets.

On Making a Dent in AI Research

You should start with identifying your goals. The top priority, in the beginning, should be accumulating knowledge and skills: implementing models, doing experimentation, reading research papers, etc. Writing blog posts help develop a deep understanding and your own ideas. When something gets hard (your papers get rejected, your experiments do not work for months, etc.), it’s good to have a goal in front of you, so you know why you’re doing these things.

Also, keep learning. I constantly try to figure out what other people are doing differently with better success. Deep learning research is a lot about running/designing/evaluating experiments.

Finally, networking and advertisement are crucial. You have to learn how to sell yourself, get to know important people in your field, and perhaps collaborate with them if possible.