Bio
Papers
Talks
Experience
Teaching
Writings
🗂️
🏷️

Recent & Upcoming Talks
Publications
Projects
Writings
Blog
Projects
Experience
Teaching
Academic notes
Technical notes

Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-training

Jan 1, 2026·

William Hoy

Binxu Wang

Binxu Wang

,

Xu Pan

· 0 min read

Type

Journal article

Publication

under review at COLM 2026

Last updated on Jan 1, 2026

LLM Reinforcement Learning Science of AI Theory

Binxu Wang

Authors

Research Fellow

← Differentiable Faithfulness Alignment for Cross-Model Circuit Transfer Jan 1, 2026

Structure as an inductive bias for brain–model alignment Dec 4, 2025 →

© 2026 Me. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder — the free, open source website builder that empowers creators.