Yao Fu | [email protected] | twitter
University of Edinburgh
Thank Hao Peng, Tushar Khot at AI2 for insightful discussions
Started writing on Apr 30 2023
Released on May 01 2023
Last updated on May 09 2023
Other versions: [pdf] [Arxiv] [中文] [bib]
Recently, there are many works on smaller models that achieve inspiring dialog abilities, which makes people imagine if smaller models can have comparable performance to large models like GPT-3.5. Generally, language models have multi-dimensional abilities, which makes them hard to compare. Finding the correct metric is crucial for developing strong language models. At the current stage, the community is eager to know what are the key differentiators that mark the potential of strong language models.
In GPT-4 release blog, the authors write: “In a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference comes out when the complexity of the task reaches a sufficient threshold”. This means that complex tasks are likely to be the key differentiators for large v.s. small language models.
More importantly, complex reasoning opens up opportunities for building a large spectrum of applications upon language models, effectively making language models the next-generation computation platform/ operating system. This has the potential to substantially change the way humans interact with computers and reshape the whole computational ecosystem.
In this post, we take a close look at methods toward models of strong complex reasoning capabilities.
In Astrophotography, when shooting star trails with long exposure, the Polaris, or the North Star, sits at the center of the star trail, always pointing to the true north. In ancient times, it is the star that guides the directions for travelers.
Table of Content
We study complex reasoning for two reasons:
The vision to make language models the next-generation operating system is particularly interesting because it opens countless possibilities for building new applications and creating a language model based computational ecosystem (probably even larger opportunities than super apps like ChatGPT). The ability of complex reasoning serves as the foundation because if we want the model to become a new OS, it needs to be able to complete complex instructions through interactions with tools, users, and all elements of the outside environment.
This post studies how to train models of strong complex reasoning, how to do prompt engineering to fully release the model’s reasoning ability, and how to evaluate the models’ reasoning performance. The content of this post is divided as: