Welcome to rlxf.ai

Welcome to rlxf.ai — my personal research blog.

I'm Shixiang Shane Gu, a Senior Staff Research Scientist at Google DeepMind working on the Gemini Thinking team. This site is a place where I'll share research notes, tutorials, and ideas across my areas of interest:

Reinforcement learning from human feedback (RLHF) and reward modeling
Reasoning and self-improvement in large language models
Generative modeling (with a soft spot for Gumbel-Softmax)
Sample-efficient deep RL

Why a blog?

The research I care about most lives at the intersection of rigorous math, scalable systems, and things that actually matter. I want this blog to be a place for half-formed ideas, distill-style deep dives, and interactive demos — not just links to arXiv papers.

Expect posts with LaTeX like this: the Gumbel-Softmax reparameterization trick lets us write a sample from a categorical distribution as:

y_k = \frac{\exp((\log \pi_k + g_k) / \tau)}{\sum_{j=1}^{K} \exp((\log \pi_j + g_j) / \tau)}

where $g_k \sim \text{Gumbel}(0, 1)$ and $\tau$ is the temperature parameter.

More posts coming soon. In the meantime, find me on X/Twitter or check out my work on Google Scholar.