Published on

Welcome to rlxf.ai

Authors
  • avatar
    Name
    Shixiang Shane Gu
    Twitter
    @RLXFai

Welcome to rlxf.ai — my personal research blog.

I'm Shixiang Shane Gu, a Senior Staff Research Scientist at Google DeepMind working on the Gemini Thinking team. This site is a place where I'll share research notes, tutorials, and ideas across my areas of interest:

  • Reinforcement learning from human feedback (RLHF) and reward modeling
  • Reasoning and self-improvement in large language models
  • Generative modeling (with a soft spot for Gumbel-Softmax)
  • Sample-efficient deep RL

Why a blog?

The research I care about most lives at the intersection of rigorous math, scalable systems, and things that actually matter. I want this blog to be a place for half-formed ideas, distill-style deep dives, and interactive demos — not just links to arXiv papers.

Expect posts with LaTeX like this: the Gumbel-Softmax reparameterization trick lets us write a sample from a categorical distribution as:

yk=exp((logπk+gk)/τ)j=1Kexp((logπj+gj)/τ)y_k = \frac{\exp((\log \pi_k + g_k) / \tau)}{\sum_{j=1}^{K} \exp((\log \pi_j + g_j) / \tau)}

where gkGumbel(0,1)g_k \sim \text{Gumbel}(0, 1) and τ\tau is the temperature parameter.

More posts coming soon. In the meantime, find me on X/Twitter or check out my work on Google Scholar.