Yao Fu | Website | Blog | Twitter / X
University of Edinburgh | [email protected]
Released on Apr 22 2024
https://embed.notionlytics.com/wt/ZXlKM2IzSnJjM0JoWTJWVWNtRmphMlZ5U1dRaU9pSkRjbEpXY0dSRGRXdGhhekoxT1VoMVZGbGtkeUlzSW5CaFoyVkpaQ0k2SW1WbVptWXhZekJqTVRnMVpqUXdNRGhoWmpZM00ySTNPR1poWmpnellqWXhJbjA9
💡 Key takes
- The scaling of text data is likely reaching a ceiling as most of the easy web text (Common Crawl, Github, Arxiv .etc) are now used up.
- There will surely be new text data like digging harder from the internet, scanning library books and synthetic data. Yet it is quite challenging to increase another order of magnitude — more likely, they are just incremental within the current order.
- The next chapter of the game starts from multimodal, particularly unified video-language generative model, because only video data gives orders of magnitude increase.
- However, the bad news is, it seems that video data can not increase the reasoning capability of the model — recall that reasoning is the number one key capability that marks strong models.
- But the good news is, video data increase everything else, particularly grounding to real-world, and exhibit strong potential to become neural world models (instead of hard-coded physical engines like Zelda), which leads to the possibility of learning from simulated physical feedback.
- Scaling up reinforcement learning from X feedback seems to be the most promising direction to continue increase model’s reasoning capability, where X means human, AI, and environment feedback.
- Just like how AlphaGo Zero achieves super-human performance on Go, self-play and interacting with the environment could be a direction for super-human generative models. Making the model online and iteratively learn from the feedback (instead of a single-sound offline optimization) could potentially lead to continuously increased reasoning capability.
- The first chapter of the game of scale focus on scaling text data, which peaks at GPT-4 and concluded by Llama 3. The second chapter of this game would be unified video-language generative modeling and iterative reinforcement learning from X feedback.
Table of Content
Disclaimer: This article is essentially a quick personal research note about future work after reading through the release note of Llama 3. The opinion presented could be different than existing beliefs. I welcome any criticisms and contradictory opinions. You can either directly comment on this document, message me on X, or send me an email for detailed discussions.
1 - How good is Llama 3?
Pretty good.
For the base model, we check MMLU, MATH, GPQA, and BBH as key metrics because they measures advanced knowledge and reasoning, and the leaderboard looks like this.