Tong Chen @tomchen0

PhD student @uwcse @uwnlp Joined February 2023

Tweets

135
Followers

549
Following

486
Likes

182

Shangbin Feng @shangbinfeng

3 weeks ago

👀 How to find more difficult/novel/salient evaluation data? ✨ Let the data generators find it for you! Introducing Data Swarms, multiple data generator LMs collaboratively search in the weight space to optimize quantitative desiderata of evaluation.

2 17 114 18K 65

Download Image

Yanai Elazar @yanaiela

3 weeks ago

I’m excited to share that I'm joining Bar-Ilan University as an assistant professor!

110 21 524 35K 13

Download Image

Feng Yao @fengyao1909

4 weeks ago

⚡𝐅𝐏𝟖 makes RL faster — but at the cost of performance. We present 𝐅𝐥𝐚𝐬𝐡𝐑𝐋, the first 𝐨𝐩𝐞𝐧–𝐬𝐨𝐮𝐫𝐜𝐞 & 𝐰𝐨𝐫𝐤𝐢𝐧𝐠 𝐑𝐋 𝐫𝐞𝐜𝐢𝐩𝐞 that applies 𝐈𝐍𝐓𝟖/𝐅𝐏𝟖 for rollout 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐥𝐨𝐬𝐢𝐧𝐠 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 compared to 𝐁𝐅𝟏𝟔! 📝 Blog:…

11 89 566 56K 451

Download Image

Boyuan Zheng@ICML @boyuan__zheng

a month ago

Remember “Son of Anton” from the Silicon Valley show(@SiliconHBO)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code? It’s starting to look a lot like reality. Even 18 months ago, my own…

Scale AI @scale_AI

a month ago

7 22 88 30K 19

0 28 66 7K 17

Download Image

Stella Li @StellaLisy

2 months ago

WHY do you prefer something over another? Reward models treat preference as a black-box😶‍🌫️but human brains🧠decompose decisions into hidden attributes We built the first system to mirror how people really make decisions in our #COLM2025 paper🎨PrefPalette✨ Why it matters👉🏻🧵

6 75 380 46K 269

Download Image

Akari Asai @AkariAsai

2 months ago

Some updates 🚨 I finished my Ph.D at @uwcse in June 2025! After a year at AI2 as a Research Scientist, I am joining CMU @LTIatCMU & @mldcmu (courtesy) as an Assistant Professor in Fall 2026. The journey, acknowledgments & recruiting in 🧵

117 62 1K 112K 108

Download Image

Scott Geng @scottgeng00

2 months ago

🤔 How do we train AI models that surpass their teachers? 🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯 The secret? Learn from the *differences* in weak data pairs! 📜 arxiv.org/abs/2507.06187 🧵 below

7 51 162 22K 114

Download Image

Weijia Shi @WeijiaShi2

2 months ago

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data…

Ai2 @allen_ai

2 months ago

13 73 434 334K 189

Download Video

9 87 280 58K 91

Download Video

Xinxi Lyu @XinxiLyu

2 months ago

Reasoning benchmarks (e.g., MMLU Pro and GPQA) have seen little benefit from naive RAG. But can we flip this? 🔥Introducing CompactDS: ✅Web-scale coverage ✅Runs with just 100GB RAM ✅Matches search engines The simplest RAG pipeline can even compete with agentic…

1 17 53 17K 22

Download Gif

Victoria Graf @VictoriaWGraf

2 months ago

Worried about overfitting to IFEval? 🤔 Use ✨IFBench✨ our new, challenging instruction-following benchmark! Loved working w/ @valentina__py! Personal highlight: our multi-turn eval setting makes it possible to isolate constraint-following from the rest of the instruction 🔍

Valentina Pyatkin @valentina__py

2 months ago

5 95 353 48K 184

Download Image

2 16 54 10K 17

Valentina Pyatkin @valentina__py

2 months ago

💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of constraints and verifier functions is limited and most models overfit on IFEval. We introduce IFBench to measure model generalization to unseen constraints.

5 95 353 48K 184

Download Image

CLS @ChengleiSi

2 months ago

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

11 181 624 145K 215

Download Image

Thao Nguyen @thao_nguyen26

3 months ago

Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔 We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats! arxiv.org/abs/2506.04689

14 63 223 34K 130

Download Image

Hao Xu @xuhaoxh

3 months ago

Wanna 🔎 inside Internet-scale LLM training data w/o spending 💰💰💰? Introducing infini-gram mini, an exact-match search engine with 14x less storage req than the OG infini-gram 😎 We make 45.6 TB of text searchable. Read on to find our Web Interface, API, and more. (1/n) ⬇️

6 23 63 21K 33

Download Image

Sarah Wiegreffe @sarahwiegreffe

3 months ago

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland @umdcs this August. I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)

70 50 608 42K 84

Download Image

Jacqueline He @jcqln_h

3 months ago

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.

2 24 50 8K 14

Download Image

Yike Wang @yikewang_

3 months ago

LLMs are helpful for scientific research — but will they continuously be helpful? Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).

11 55 242 23K 128

Download Image

Cohere Labs @Cohere_Labs

3 months ago

Next week on Wednesday, June 11th we're excited to welcome @StellaLisy for a session on "Spurious Rewards: Rethinking Training Signals in RLVR." Thanks to @AhmadMustafaAn1 for organizing this session! 🔥 Learn more: cohere.com/events/Cohere-…

0 6 35 24K 12

Download Image

Sahil Verma @Sahil1V

3 months ago

🚨 New Paper! 🚨 Guard models slow, language-specific, and modality-limited? Meet OmniGuard that detects harmful prompts across multiple languages & modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster 🚀 arxiv.org/abs/2505.23856

1 39 82 14K 23

Download Image

Yizhong Wang @yizhongwyz

3 months ago

Thrilled to announce that I will be joining @UTAustin @UTCompSci as an assistant professor in fall 2026! I will continue working on language models, data challenges, learning paradigms, & AI for innovation. Looking forward to teaming up with new students & colleagues! 🤠🤘