Yao Fu | Website | Blog

University of Edinburgh | [email protected]

with **Hao Peng** and Tushar Khot

work done at Allen Institute for AI

Thank Junxian He @SJTU, Pan Lu @UCLA, **Ruibo Liu** @Dartmouth for insightful initial discussions and suggestions.

Thank Raj Ammanabrolu @AI2, Peter Liu @Google Brain, Brendan Dolan-Gavitt @NYU**, Denny Zhou** @Google Brain, Aman Madaan @CMU for discussions and suggestions after release, which greatly improved the comprehensiveness.

Started writing on Thu Dec 08, 2022, Released on Dec 11, 2022, Last Edit May 16 2023

Other versions: [pdf] [Arxiv] [中文] [bib]

Discuss on twitter with the author

TL; DR

https://embed.notionlytics.com/wt/ZXlKd1lXZGxTV1FpT2lKaU9XRTFOMkZqTUdaalpqYzBaak13WVRGaFlqbGxNMlV6Tm1aaE1XUmpNU0lzSW5kdmNtdHpjR0ZqWlZSeVlXTnJaWEpKWkNJNklrTnlVbFp3WkVOMWEyRnJNblU1U0hWVVdXUjNJbjA9

Recently, the field has been greatly impressed and inspired by OpenAI’s ChatGPT. It is undoubtedly clever, capable, and very fun to talk to. Its multi-faceted abilities are significantly beyond many NLP researchers’ and practitioners’ expectations based on the impression of (not-that-strong) original GPT-3. The natural question is how ChatGPT gets there, and where these fantastic abilities come from. In this post, we try to dissect the emergent abilities and trace them to their sources, hoping to give a comprehensive roadmap about how the GPT-3.5 model family, along with related large language models, evolved to their current forms.

We hope this post can promote the transparency of large language models and serve as the roadmap for the community’s ongoing efforts of reproducing GPT-3.5.

To readers:

Table of Content