📢📢Excited to share our new work 🍛CurryDPO 1/2 🔴Systematically curates multiple preference pairs and trains upon them in a curriculum learning setup with DPO framework 🔴Achieves notable performance gains over vanilla DPO method on MTbench, Vicuna, WizardLM, and UltraFeedback
1
12
19
4K
7
Download Image
2/2 🔴Achieves a remarkable score of 7.43 on MT-bench with Zephyr-7B, outperforming many of existing LLMs with similar parameter size. 🔜 *Stay tuned* for our Github repo and models!