We add some features to improve the GRPO training speed up to 200% based on the awesome work of TRL and Open-R1, check script here: github.com/modelscope/ms-… check wandb here: wandb.ai/tastelikefeet/…
1
1
2
574
0
Download Image
@ms_swift2023 CUAqhmrrmudLYDC5KLnVWdWzpCBtRt5VsTf1MoD5pump this???