Yao Fu | Website | Blog | Twitter / X

University of Edinburgh | [email protected]

Released on Apr 22 2024

https://embed.notionlytics.com/wt/ZXlKM2IzSnJjM0JoWTJWVWNtRmphMlZ5U1dRaU9pSkRjbEpXY0dSRGRXdGhhekoxT1VoMVZGbGtkeUlzSW5CaFoyVkpaQ0k2SW1WbVptWXhZekJqTVRnMVpqUXdNRGhoWmpZM00ySTNPR1poWmpnellqWXhJbjA9

The journey of this blog tells the story of the game of scale. We started from a thorough discussion of the evolution of GPT in 2022, why complex reasoning is the core capability (May 2023), how to do instruction tuning (Jun 2023), and how to deploy the model efficiently (Dec 2023). Now as Llama 3 is released, the community can have a conclusion of the first chapter of the game of scale whose goal is to build GPT-4 level models, and start a new chapter of multimodal models.

💡 Key takes

The scaling of text data is likely reaching a ceiling as most of the easy web text (Common Crawl, Github, Arxiv .etc) are now used up.
There will surely be new text data like digging harder from the internet, scanning library books and synthetic data. Yet it is quite challenging to increase another order of magnitude — more likely, they are just incremental within the current order.
The next chapter of the game starts from multimodal, particularly unified video-language generative model, because only video data gives orders of magnitude increase.
However, the bad news is, it seems that video data can not increase the reasoning capability of the model — recall that reasoning is the number one key capability that marks strong models.
But the good news is, video data increase everything else, particularly grounding to real-world, and exhibit strong potential to become neural world models (instead of hard-coded physical engines like Zelda), which leads to the possibility of learning from simulated physical feedback.
Scaling up reinforcement learning from X feedback seems to be the most promising direction to continue increase model’s reasoning capability, where X means human, AI, and environment feedback.
Just like how AlphaGo Zero achieves super-human performance on Go, self-play and interacting with the environment could be a direction for super-human generative models. Making the model online and iteratively learn from the feedback (instead of a single-sound offline optimization) could potentially lead to continuously increased reasoning capability.
The first chapter of the game of scale focus on scaling text data, which peaks at GPT-4 and concluded by Llama 3. The second chapter of this game would be unified video-language generative modeling and iterative reinforcement learning from X feedback.

Table of Content

Disclaimer: This article is essentially a quick personal research note about future work after reading through the release note of Llama 3. The opinion presented could be different than existing beliefs. I welcome any criticisms and contradictory opinions. You can either directly comment on this document, message me on X, or send me an email for detailed discussions.

1 - How good is Llama 3?

Pretty good.