Kicking off day 2 of TechInnov Day – AI for Africa, were 2 technical talks focused on machine learning (ML) methods and advancements, the rest were more application-oriented. Today, I will focus on these 2 technical talks at the conference as I’ve learned the most from them.
If you missed the previous installments of this mini-series, they are linked here
- Realtime machine translations and automating business management
- Application of deep neural network to tabular data
Machine Learning & Perspective
Yann’s keynote focuses on building and training the neural network architecture for autonomous visual learning (e.g. in humans). The significance of this work is paramount, because it would be the foundation for Artificial General Intelligence (AGI) with the ability to plan, reason, and learn the representations of our world (a.k.a. the world model). If anyone is going to solve this problem, it’s probably going to be Yann LeCun. To make this audacious goal tangible, Yann noted the biggest challenge lies in how to build and train this world model.
Sparing all the mathematical details (which I really enjoyed), Yann proposed an energy-based models (EBM) architecture called JEPA (Joint Embedding Predictive Architecture) as a candidate for the world model. But specifying the model is only the first step. The harder part is how to train it (i.e. determining the parameters of the model). There are 2 classes of methods to train these types of EBMs:
Although Yann spends much time highlighting the advantages of regularized methods over contrastive methods, recent data seem to suggest that they perform equally well. And Yann admits that he doesn’t yet understand why, even though his theories suggest otherwise.
It’s truly admirable for a world’s preeminent expert to respect data so much to question his own theories. What makes Yann LeCun great and continues to gain my respect is his humility, his unassuming manners, and his respect for others’ work. It’s really a privilege to meet him in person and learn from him at TechInnov Day. And it’s my honor and my pride to be sharing the stage with him.
Learning by bootstrapping
Michal’s talk focuses on a self-supervised learning (SSL) method called Bootstrap Your Own Latent (BYOL) that aims to learn a task-free representation of image data. Because well-labeled images are very scarce, supervised learning will be limited to small models (with few model parameters) that can be trained with the labeled data available. In contrary, SSL can be pre-trained in a task-independent way using the vast amount of freely available unlabeled data. Then using Transfer Learning, these pre-trained models can be fine-tuned to solve a specific task using much fewer training samples.
In fact, ChatGPT is a kind of pre-trained language model called Transformers, and the “GPT” in ChatGPT stands for Generative Pre-trained Transformer. Since it already understands all the languages (i.e. all the grammars, synonyms, expressions, idioms, writing styles, etc.), it can be trained with a much smaller training dataset to perform specific tasks (e.g. answer multi-lingual customer support inquiries for a company’s product portfolio).
Michal also introduces curiosity-driven exploration in Reinforcement Learning (RL) to be coupled with BYOL. Rather than defining curiosity based on image saliency, the novelty of Michal’s approach is in defining curiosity as the misfit of the model. This exploration policy essentially tells the RL algorithm to systematically explore areas of the image space where the pre-trained model doesn’t predict well (much like boosting with L2 loss). This further improves the learning efficacy of BYOL to near the theoretical limit, which is truly amazing.
Check out this short video highlighting the best of TechInnov Day – AI for Africa.