Matching Accuracy, Different Geometry: Evolution Strategies vs GRPO in LLM Post-trainingJan 1, 2026·William HoyBinxu Wang,Xu Pan· 0 min readTypeJournal articlePublicationunder review at COLM 2026Last updated on Jan 1, 2026LLM Reinforcement Learning Science of AI AuthorsBinxu WangResearch Fellow ← Differentiable Faithfulness Alignment for Cross-Model Circuit Transfer Jan 1, 2026Neuronal Tuning Aligns Dynamically with Object- and Texture Manifolds across the Visual Hierarchy Jan 1, 2026 →