Home | Twitter / X | Google Scholar | Semantic Scholar

Hugging Face | Github | About Yao Fu

I am a research scientist at Google DeepMind.

https://embed.notionlytics.com/wt/ZXlKM2IzSnJjM0JoWTJWVWNtRmphMlZ5U1dRaU9pSkRjbEpXY0dSRGRXdGhhekoxT1VoMVZGbGtkeUlzSW5CaFoyVkpaQ0k2SW1JMVpXWmtNbVV3TUdWaE9UUmlaRE5oTVRoallURmhaVGM0WlRZMU5XWTRJbjA9

I did my Ph.D. study at the University of Edinburgh (2020-2024) with professor Mirella Lapata. I finished my M.S. at Columbia University (2018-2020) with professor John Cunningham and my B.S. at Peking University (2013-2018) with professor Yansong Feng. Before Ph.D., I spent great time visiting professor Alexander Rush at Cornell Tech (2019-2020).

During my PhD study, I developed methods for complex reasoning like complexity-based prompting and CoT specialization, self-play multi-agent debate like GPT-Bargaining. My blog poses the connection between code and reasoning in “early days”. I also studied long-context continual pretraining and efficient deployment recipes, and identified retrieval heads that mechanistically explain long-context factuality.

I am interested in large-scale generative models for human intelligence. My research objective is to make large multimodal models the next generation computational platforms and become generally capable agents. I am broadly interested in scaling, long-context, multimodal, reasoning and efficiency.


Featured Research

Arxiv 2024 | Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis [paper][Twitter/X]

Arxiv 2024 | Retrieval Head Mechanistically Explains Long-Context Factuality [code][paper][Twitter/X]

ICML 2024 | Data Engineering for Scaling Language Models to 128K Context [code][Paper][Twitter/X]