• Marktechpost Profile Picture

    Marktechpost AI Dev News ⚡ @Marktechpost

    6 months ago

    Sea AI Lab Researchers Introduce Dr. GRPO: A Bias-Free Reinforcement Learning Method that Enhances Math Reasoning Accuracy in Large Language Models Without Inflating Responses Researchers from Sea AI Lab, the National University of Singapore, and Singapore Management University introduced a new approach called Dr. GRPO (Group Relative Policy Optimization Done Right) to address these issues. This method removes the problematic normalization terms from the GRPO formulation. Specifically, it eliminates the response length and standard deviation scaling factors that caused imbalances in model updates. The revised algorithm computes gradients more fairly across different responses and question types. They applied this method to train Qwen2.5-Math-7B, an open-source base model and demonstrated its effectiveness on multiple benchmarks. The training process used 27 hours of computing on 8× A100 GPUs, a relatively modest setup considering the results achieved. The researchers tested their method on prominent math reasoning benchmarks, including AIME 2024, AMC, MATH500, Minerva Math, and OlympiadBench. The model trained with Dr. GRPO achieved 43.3% accuracy on AIME 2024, significantly outperforming SimpleRL-Zero-7B (36.0%), Prime-Zero-7B (27.6%), and OpenReasoner-Zero-7B (16.7%). It also demonstrated strong average performance across all tasks: 40.9% on MATH500, 45.8% on Minerva, and 62.7% on OlympiadBench. These results validate the effectiveness of the bias-free RL method. Importantly, the model performed better and showed more efficient token usage. Incorrect responses became shorter and more focused, a notable shift from previous training methods encouraging overextended answers regardless of correctness....... Read full article: marktechpost.com/2025/03/22/sea… Paper: github.com/sail-sg/unders… GitHub Page: github.com/sail-sg/unders… @zzlccc

    Marktechpost tweet picture

    3 38 135 8K 71
    Download Image
  • TapanMahata19 Profile Picture

    Tapan_CS @TapanMahata19

    6 months ago

    @Marktechpost Great, now we do not have to do RAG. Put everything in the context.

    0 0 0 80 0
  • shidumiro Profile Picture

    H4sh3 @shidumiro

    6 months ago

    @Marktechpost R2 will cook

    0 0 0 14 0
  • AIgen_invest Profile Picture

    Alonso AIgen @AIgen_invest

    6 months ago

    @Marktechpost This could significantly shift AI model training.

    0 0 0 136 0
  • Download Image
    • Privacy
    • Term and Conditions
    • About
    • Contact Us
    • TwStalker is not affiliated with X™. All Rights Reserved. 2024 www.instalker.org

    twitter web viewer x profile viewer bayigram.com instagram takipçi satın al instagram takipçi hilesi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al sosyalgram takipçi satın al instagram ücretsiz takipçi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al metin2 metin2 wiki metin2 ep metin2 dragon coins metin2 forum metin2 board popigram instagram takipçi satın al takipçi hilesi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al buyfans buy instagram followers buy instagram likes buy instagram views buy tiktok followers buy tiktok likes buy tiktok views buy twitter followers buy telegram members Buy Youtube Subscribers Buy Youtube Views Buy Youtube Likes forstalk postegro web postegro x profile viewer