Right now we set the default for trainer.algorithm.grpo_norm_by_std to true - for Dr. GRPO this should be false. Other research is adopting grpo_norm_by_std=false even if not using Dr. GRPO length normalization. We should consider updating this default.
Relevant discussion: