[skyrl-train] consider updating grpo_norm_by_std default to false

Right now we set the default for `trainer.algorithm.grpo_norm_by_std` to true - for Dr. GRPO this should be false. Other research is adopting `grpo_norm_by_std=false` even if not using Dr. GRPO length normalization. We should consider updating this default.

Relevant discussion:
- Deepseek-v3.2 adopts this: https://x.com/zzlccc/status/1995770284385992798
- Qwen adopts this in MiniRL: https://arxiv.org/pdf/2512.01374 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[skyrl-train] consider updating grpo_norm_by_std default to false #869

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[skyrl-train] consider updating grpo_norm_by_std default to false #869

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions