Mo Beigi @mbeigi_cs

CS PhD Student @ucdavis mbeigics.github.io California Joined June 2025

Tweets

53
Followers

24
Following

284
Likes

157

Mor Geva @megamor2

a month ago

@soheeyang_ @GoogleDeepMind @KassnerNora @elenagri_ @riedelcastro @dhgottesman @DanaRamati @GurYoav @IdoC0hen @RGiryes 📍2025-07-30 9:00 - 10:30 Room 1.85 @AmitElhelo will introduce MAPS, a general framework for inferring the functionality of attention heads in LLMs directly from their parameters. x.com/megamor2/statu…

Mor Geva @megamor2

9 months ago

5 57 301 25K 260

Download Image

1 2 9 1K 3

Yiping Lu @2prime_PKU

a month ago

Anyone knows adam?

271 465 5K 615K 505

Download Image

👉 New preprint! Today, many the biggest challenges in LM post-training aren't just about correctness, but rather consistency & coherence across interactions. This paper tackles some of these issues by optimizing reasoning LMs for calibration rather than accuracy...

Mehul Damani @MehulDamani2

2 months ago

13 266 894 94K 613

Download Image

2 11 101 14K 52

Tu Vu @tuvllms

3 months ago

✨ New paper ✨ 🚨 Scaling test-time compute can lead to inverse or flattened scaling!! We introduce SealQA, a new challenge benchmark w/ questions that trigger conflicting, ambiguous, or unhelpful web search results. Key takeaways: ➡️ Frontier LLMs struggle on Seal-0 (SealQA’s…

4 39 146 17K 67

Download Image

CLS @ChengleiSi

2 months ago

Are AI scientists already better than human researchers? We recruited 43 PhD students to spend 3 months executing research ideas proposed by an LLM agent vs human experts. Main finding: LLM ideas result in worse projects than human ideas.

11 181 624 145K 215

Download Image

Heng Ji @hengjinlp

2 months ago

I’m looking for a new postdoc to start this fall working on AI for Science/Science-Inspired AI (focusing on chemistry and bioengineering domains for now). Please drop me a CV if interested.

1 17 67 9K 7

Rohan Paul @rohanpaul_ai

2 months ago

The paper claims coding benchmarks high scores of LLMs may come from memorizing past GitHub issues, not real reasoning.😯 The authors build a tiny test: given only the text of an issue, guess the file path that needs fixing. Models hit up to 76% accuracy on the benchmark set,…

14 31 173 15K 93

Download Image

Rohan Paul @rohanpaul_ai

2 months ago

LLM reasoning with reinforcement learning focuses on limited domains, hindering general applicability. This paper develops GURU, a 92,000-example multi-domain dataset, to enable broader reinforcement learning-based reasoning. Methods 🔧: - GURU includes Math, Code, Science,…

2 10 40 7K 29

Download Image

Rohan Paul @rohanpaul_ai

2 months ago

Large language models exhibit grokking, where generalization improves significantly long after training loss converges. This paper identifies grokking in large-scale LLM pretraining and provides internal metrics to monitor this delayed generalization without external validation.…

1 5 17 4K 12

Download Image

Gagan Jain @gaganjain1582

2 months ago

Neat work by my awesome colleagues

Harman Singh @Harman26Singh

2 months ago

Neat work by my awesome colleagues

4 31 124 27K 82

Download Image

0 3 9 1K 0

Parshin Shojaee @ParshinShojaee

2 months ago

A bit late but happy to share that LLM-SRBench, our new benchmark targeting memorization issue in LLMs for scientific discovery is selected for *Oral* presentation at #ICML2025 ! Great to see the community recognizing the importance of this direction. Checkout the camera-ready…

Parshin Shojaee @ParshinShojaee

5 months ago

4 32 199 37K 117

Download Image

0 4 34 2K 3

Ekdeep Singh @EkdeepL

2 months ago

🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient? Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵 1/

9 64 347 63K 348

Download Gif

Rohan Paul @rohanpaul_ai

3 months ago

This is really BAD news of LLM's coding skill. ☹️ The best Frontier LLM models achieve 0% on hard real-life Programming Contest problems, domains where expert humans still excel. LiveCodeBench Pro, a benchmark composed of problems from Codeforces, ICPC, and IOI (“International…

100 315 2K 458K 2K

Download Image

Rohan Paul @rohanpaul_ai

2 months ago

This study shows the same models break down on Olympiad problems and cannot even flag their own faulty proofs. Showed that frontier LLM handle fewer than 4 % of Olympiad proofs correctly and misjudge their own flawed reasoning. Current math benchmarks mark a right answer and…

7 7 22 4K 18

Download Image

Charles Arnal @arnal_charles

2 months ago

❓How to balance negative and positive rewards in off-policy RL❓ In Asymmetric REINFORCE for off-Policy RL, we show that giving less weight to negative rewards is enough to stabilize off-policy RL training for LLMs! 💪 (1/8) Paper: arxiv.org/abs/2506.20520

2 28 155 16K 127

Download Image

Ahmet Kaya @ai_ahmetkaya

3 months ago

We’re hiring @Apple! 📢 Looking for a Computer Vision / ML Engineer to join our team. Work on cutting-edge AI at scale. lnkd.in/gkE_varC Contact me if interested! #ML #ComputerVision #AppleJobs #hiring #hiringNow

1 1 3 341 0

Alex Dimakis @AlexGDimakis

2 months ago

Exciting new RL tooling: A modular library for RL training by the Berkeley NovaSky team. While standard RL training is all done in one loop, it is more efficient for modern post-training to separate the generation of the rollouts from the trainer. It also enables asynchronous…

NovaSky @NovaSkyAI

2 months ago

2 45 205 38K 124

Download Image

0 12 60 6K 15

Rohan Paul @rohanpaul_ai

2 months ago

Github: A fully open source framework for creating RL training swarms over the internet. Train reinforcement-learning models collaboratively across decentralized peers, leveraging GenRL-Swarm on consumer laptops or GPUs Plug into a global swarm, contribute compute, and…

1 6 13 2K 9

Download Image

Mor Geva @megamor2

3 months ago

Removing knowledge from LLMs is HARD. @GurYoav proposes a powerful approach that disentangles the MLP parameters to edit them in high resolution and remove target concepts from the model. Check it out!

Yoav Gur Arieh @GurYoav

3 months ago

2 8 68 6K 33

Download Image

0 4 26 2K 2

Tanishq Mathew Abraham, Ph.D. @iScienceLuvr

3 months ago

How Well Can Reasoning Models Identify and Recover from Unhelpful Thoughts? "We show that models are effective at identifying most unhelpful thoughts but struggle to recover from the same thoughts when these are injected into their thinking process, causing significant…