Datacast

Episode 34: Deep Learning Generalization, Representation, and Abstraction with Ari Morcos

Episode Summary

Ari Morcos is a Research Scientist at Facebook AI Research working on understanding the mechanisms underlying neural network computation and function and using these insights to build machine learning systems more intelligently. In particular, Ari has worked on a variety of topics, including understanding the lottery ticket hypothesis, the mechanisms underlying common regularizers, and the properties predictive of generalization, as well as methods to compare representations across networks, the role of single units in computation, and on strategies to measure abstraction in neural network representations. Previously, he worked at DeepMind in London, and earned his Ph.D. in Neurobiology at Harvard University, using machine learning to study the cortical dynamics underlying evidence accumulation for decision-making.

Episode Notes

Show Notes

(2:32) Ari discussed his undergraduate studying Physiology and Neuroscience at UC San Diego, while doing neuroscience research on adult neurogenesis at the Gage Lab.
(4:39) Ari discussed his decision to pursue a Ph.D. in Neurobiology at Harvard after college and extracted the importance of communication in research, thanks to his advisor Chris Harvey.
(7:16) Ari explained his Ph.D. thesis titled “Population dynamics in parietal cortex during evidence accumulation for decision-making” - in which he developed methods to understand how neuronal circuits perform the computations necessary for complex behavior.
(12:59) Ari talked about his process of learning machine learning and using that to analyze massive neuroscience datasets in his research.
(15:22) Ari recounted attending NIPS 2015 and serendipitously meeting people from DeepMind, which he lated joined as a Research Scientist in their London office.
(18:59) Ari’s research focuses on the generalization of neural networks, and shared his work called "On the Importance of Single Directions for Generalization” presented at ICLR 2018 (inspired by Chiyuan Zhang’s paper and Quoc Le’s paper previously).
(28:51) Ari explained the differences between generalizing networks and memorizing networks, citing the results from his work "Insights on Representational Similarity in Neural Networks with Canonical Correlation” with Maithra Raghu and Samy Bengio presented at NeurIPS 2018 (Read Maithra’s paper on SVCCA that inspired it).
(35:16) Another topic that Ari focuses on is representation learning and abstraction for intelligent systems. His team at DeepMind proposes a dataset and a challenge designed to probe abstract reasoning, as explained in “Measuring Abstract Reasoning in Neural Networks" presented at ICML 2018 (learn more about the IQ test Raven’s Progressive Matrices and take the challenge here).
(42:21) An extension from the work above is "Learning to Make Analogies by Contrasting Abstract Relational Structure" - presented at ICLR 2019. With the same authors (led by Felix Hill along with David Barrett, Adam Santoro, Tim Lillicrap), Ari showed that while architecture choice can influence generalization performance, the choice of data and the manner in which it is presented to the model is even more critical.
(48:18) Ari discussed "Neural Scene Representation and Rendering” (led by Ali Eslami and Danilo Rezende) that introduces Generative Query Network (GQN), a framework within which machines learn to represent scenes using only their own sensors (watch the video and check out the data).
(55:09) Ari explained the findings in "Analyzing Biological and Artificial Neural Networks: Challenges with Opportunities for Synergy?” published at the Current Opinion in Neurobiology (joint work with David Barrett and Jakob Macke).
(57:04) Ari shared the properties of pruning algorithms that influence stability and generalization, as claimed in “The Generalization-Stability Tradeoff in Neural Network Pruning” led by Brian Bartoldson.
(01:00:56) Ari went over the generalization of lottery tickets in neural networks, which is inspired by the lottery ticket hypothesis from Jonathan Frankle and Michael Carbin at MIT. The two papers mentioned are collaboration with Haonan Yu, Yuandong Tian, Michela Paganini, and Sergey Edunov (Check out his talk at REWORK Deep Learning Summit in Montreal 2019).
(01:09:00) Ari investigated "Training BatchNorm and Only BatchNorm” which looks at the performance of neural networks when trained only with the Batch Normalization parameters (joint work with Jonathan Frankle and David Schwab).
(01:12:12) Ari mentioned "The Early Phase of Neural Network Training” (presented at ICML 2020) that uses the lottery ticket framework to rigorously examine the early part of the training (joint work with Jonathan Frankle and David Schwab).
(01:16:25) Ari discussed at length “Representation Learning Through Latent Canonicalizations" (presented at ICLR 2020). This work seeks to learn representations in which semantically meaningful factors of variation (like color or shape) can be independently manipulated by learned linear transformations in latent space, termed “latent canonicalizes” (joint work with Or Litany, Srinath Sridhar, Leonidas Guibas, and Judy Hoffman).
(01:22:15) Ari summarized "Selectivity Considered Harmful: Evaluating the Causal Impact of Class Selectivity in DNNs" - which investigates the causal impact of class selectivity on network function (led by Matthew Leavitt).
(01:25:26) Ari reflected on his career and shared advice for individuals who want to make a dent in AI research.
(01:28:10) Ari shared his excitement on self-supervised learning, which addresses the need of neural networks to require expensive labeled data.
(01:29:47) Closing segment.

His Contact Information

His Recommended Resources

“Understanding Deep Learning Requires Rethinking Generalization” by Chiyuan Zhang
“Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability” by Maithra Raghu
Raven’s Progressive Matrices IQ test
"The Lottery Ticket Hypothesis” by Jonathan Frankle and Michael Carbin (Open-Source Framework)
“Random Features for Large-Scale Kernel Machines” by Ali Rahimi and Ben Recht (NIPS 2017 Test Of Time Award)
“beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework” by DeepMind
Samy Bengio (Research Scientist at Google AI)
Aleksander Madry (Professor of Computer Science at MIT)
Jason Yosinski (Founding Member of Uber AI Labs)
“The Idea Factory: Bell Labs and The Great Age of American Innovation" by Jon Gertner