Zhepei Wei @weizhepei

Ph.D. Student @CS_UVA | Research Intern @Meta. Previously @AmazonScience. Research interest: ML/NLP/LLM. cs.virginia.edu/~tqf5qb/ Charlottesville, VA Joined January 2016

Tweets

88
Followers

188
Following

531
Likes

2K

Rohan Paul @rohanpaul_ai

2 days ago

OpenAI realesed new paper. "Why language models hallucinate" Simple ans - LLMs hallucinate because training and evaluation reward guessing instead of admitting uncertainty. The paper puts this on a statistical footing with simple, test-like incentives that reward confident…

94 334 2K 337K 2K

Download Image

Prophet Arena @ProphetArena

3 weeks ago

🔮 Introducing Prophet Arena — the AI benchmark for general predictive intelligence. That is, can AI truly predict the future by connecting today’s dots? 👉 What makes it special? - It can’t be hacked. Most benchmarks saturate over time, but here models face live, unseen…

90 149 1K 441K 879

Download Image

Jiaxin Huang @jiaxinhuang0229

4 weeks ago

Thrilled to share this exciting work, R-Zero, from my student @ChengsongH31219 where LLM learns to reason from Zero human-curated data! The framework includes co-evolution of a "Challenger" to propose difficult tasks and a "Solver" to solve them. Check out more details in the…

ChengSong Huang @ChengsongH31219

4 weeks ago

3 36 134 13K 95

Download Image

1 4 23 2K 1

ChengSong Huang @ChengsongH31219

4 weeks ago

🚀🚀Excited to share our paper R-Zero: Self-Evolving Reasoning LLM from Zero Data ! How to train LLM without data? R-Zero teaches Large Language Models to reason starting with nothing but a base model. No data required!!! Paper: arxiv.org/abs/2508.05004 Code:…

3 36 134 13K 95

Download Image

AK @_akhaliq

a month ago

R-Zero Self-Evolving Reasoning LLM from Zero Data

13 84 554 63K 487

Download Image

Anthropic @AnthropicAI

a month ago

We’re running another round of the Anthropic Fellows program. If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places.

61 215 2K 578K 1K

Download Image

Scale AI @scale_AI

a month ago

As AI agents start taking real actions online, how do we prevent unintended harm? We teamed up with @OhioState and @UCBerkeley to create WebGuard: the first dataset for evaluating web agent risks and building real-world safety guardrails for online environments. 🧵

7 22 88 30K 19

Quentin Gallouédec @QGallouedec

a month ago

There will be *no more than 5 days* between the release of GSPO and its implementation in TRL

10 11 323 34K 167

Download Image

Yang Yue @YangYue_THU

2 months ago

New paper alert: Unifies insights from Limit-of-RLVR and ProRL — does current RLVR actually expand reasoning? Turns out: RLVR is mostly an efficient sampler with shrinking, very rarely an explorer with explanding. Explore is holy grail for LLM and may entail beyond 0/1 reward.

4 17 125 9K 104

Download Image

Chujie Zheng @ChujieZheng

a month ago

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…

29 248 2K 316K 1K

Download Image

Zhepei Wei @weizhepei

2 months ago

Highlight of my #ICML2025 poster session: “So… did you train your model on the test set?” 😅 Probably the ML community’s new “standard practice” question — sadly necessary, but here we are 🤦‍♂️

0 0 2 233 0

Gautam Kamath @thegautamkamath

4 months ago

I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them I'm late for #ICLR2025 #NAACL2025, but just in time for #AISTATS2025 and timely for #ICML2025 acceptances! 1/4

5 94 674 83K 807

Download Image

Haolin Liu @HaolinLiu616

2 months ago

🚨 LLM-as-a-Judge in RLVR can be easily hacked, even GPT-4o. Simple sentences can trick top models into false positives, although the task is just to compare a given solution to a reference answer. 📊 What we found: 1️⃣ Figure 1: “:” and “Thought process:” fool nearly all models…

elvis @omarsar0

2 months ago

13 122 700 99K 640

Download Image

0 3 19 2K 1

Download Image

Zhepei Wei @weizhepei

2 months ago

Thrilled to present three works at #ICML2025!🥳 🚀AdaDecode — Wed 7/16, East Exhibition Hall A-B (#E-2605) 🔢Negative Reinforcement for Reasoning — Fri 7/18, AI for Math Workshop 🤖WebAgent-R1 — Sat 7/19, Workshop on Computer Use Agents Feel free to stop by and chat about #LLMs!

0 5 16 1K 1

Yu Meng @yumeng0818

2 months ago

Will be at #ICML2025 next week! We'll present the following works: 🛠️ LarPO: Tue 7/15 (Poster Session 1 East) 🚀 AdaDecode: Wed 7/16 (Poster Session 3 East) 🧮 Negative Reinforcement for Reasoning: Fri 7/18 (AI for Math Workshop) Happy to chat about latest research in LLMs🤩

0 8 26 2K 1

Download Image

Zengzhi Wang @SinclairWang1

2 months ago

What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?…

10 87 510 91K 476

Download Image

Lex Fridman @lexfridman

3 months ago

Here's my conversation with Terence Tao, one of the greatest mathematicians in history. We talk about the hardest problems in mathematics & physics, and how AI might help us humans to solve them. This conversation was a huge honor for me. I can't quite put it into words, but…

303 748 5K 1.2M 3K

Download Video

Zhepei Wei @weizhepei

3 months ago

Nice work! In our recent paper WebAgent-R1 (arxiv.org/abs/2505.16421), we also observed a similar finding—test-time scaling via increased interactions! Feels like we’re not far from discovering new scaling laws for agents!🤩

Aviral Kumar @aviral_kumar2

3 months ago

1 11 122 11K 79

Download Image

0 0 10 853 0

Download Image

Jiaxin Huang @jiaxinhuang0229

3 months ago

🚀🚀Excited to share our new work on Speculative Decoding by @shrangoh! We tackle a key limitation in draft models which predict worse tokens at later positions, and present PosS that generates high-quality drafts!