- Published on
Welcome to my new research blog at rlxf.ai. I will be sharing thoughts on reinforcement learning from human feedback, LLM reasoning, generative modeling, and more.
rlxf.ai is a blog and research site maintained by Shixiang Shane Gu, a pondering AI researcher. Shane has pioneered work across reinforcement learning, generative modeling, and large language model reasoning — including co-inventing Gumbel-Softmax, developing Zero-Shot Chain-of-Thought prompting ("Let's think step by step"), and building sample-efficient deep RL algorithms. This site collects his thoughts, writing, and research updates.
Thaddäus Wiedemer, Yuxuan Li, Paul Vicol, Shixiang Shane Gu, Nick Matarese, Kevin Swersky, Been Kim, Priyank Jaini, Robert Geirhos
arXiv 2025
Gemini Team, Google DeepMind
arXiv 2024
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Shixiang Shane Gu et al.
JMLR 2024