# šØ 4B open-recipe model beats Claude-4-Opus
š 100% open data, recipe, model weights and code.
Introducing PolarisāØ--a post-training recipe for scaling RL on advanced reasoning models.
š„³ Check out how we boost open-recipe reasoning models to incredible performance levelsā¦
Thereās been hot debate about (The Illusion of) The Illusion of Thinking.
My take: itās not that models canāt reason ā they just arenāt perfect at long-form generation yet.
We eval reasoning models on LongProc benchmark (requiring generating 8K CoTs, see thread). Reasoningā¦
Thereās been hot debate about (The Illusion of) The Illusion of Thinking.
My take: itās not that models canāt reason ā they just arenāt perfect at long-form generation yet.
We eval reasoning models on LongProc benchmark (requiring generating 8K CoTs, see thread). Reasoningā¦
š§µ Recent studies show LLMs can self-improve their responses when given external feedback. But how effectively can they incorporate it?
We tested this systematicallyāand found they can't fully integrate feedback, even when the feedback is high-quality and backed by ground-truth.
šØ We discovered a surprising side effect of Reinforcement Finetuning (RFT): it makes LLMs more confidently wrong on unanswerable questions.
We call this the hallucination tax: a drop in refusal behavior that leads to overconfident hallucinations.
š§µ 1/n
Textual steering vectors can improve visual understanding in multimodal LLMs!
You can extract steering vectors via any interpretability toolkit you like -- SAEs, MeanShift, Probes -- and apply them to image or text tokens (or both) of Multimodal LLMs.
And They Steer!
Billion-parameter LLMs still struggle with simple arithmetic?
š FoNE (Fourier Number Embedding) tackles this problem.
By mapping numbers directly into Fourier space, it bypasses tokenization and significantly improves numerical accuracy with better efficiency and accuracy.
Running your model on multiple GPUs but often found the speed not satisfiable? We introduce Ladder-residual, a parallelism-aware architecture modification that makes 70B Llama with tensor parallelism ~30% faster!
Work done at @togethercompute. Co-1st author with @MayankMish98ā¦
Come join the #NeurIPS2024 poster session and discuss whether language models can learn to skip steps in reasoning!
šDec 12, Thursday, 11:00 am - 2:00 pm
šEast Exhibit Hall A-C #2900
Feel free to stop by and say hi! I am actively seeking Summer 2025 internship opportunities!
Come join the #NeurIPS2024 poster session and discuss whether language models can learn to skip steps in reasoning!
šDec 12, Thursday, 11:00 am - 2:00 pm
šEast Exhibit Hall A-C #2900
Feel free to stop by and say hi! I am actively seeking Summer 2025 internship opportunities!
I'll present a poster for Lifelong ICL and Task Haystack at #NeurIPS2024!
ā° Wednesday 11am-2pm
š East Exhibit Hall A-C #2802
š arxiv.org/abs/2407.16695
My co-first author @xiaoyue02_xu is applying to PhD programs and I am looking jobs in industry! Happy to connect at NeurIPS!
I'll present a poster for Lifelong ICL and Task Haystack at #NeurIPS2024!
ā° Wednesday 11am-2pm
š East Exhibit Hall A-C #2802
š arxiv.org/abs/2407.16695
My co-first author @xiaoyue02_xu is applying to PhD programs and I am looking jobs in industry! Happy to connect at NeurIPS!
98 Followers 149 FollowingResearch MSc @Mila_Quebec @mcgill_nlp | Research Fellow @RBCBorealis | reasoning and hallucination x evaluation and interpretability | Looking for Fall '26 PhD
4K Followers 1K FollowingResearch Scientist at FAIR @AIatMeta & visiting researcher at Princeton @VisualAILab
prev. PhD at New York University
šŗš¦
519 Followers 801 FollowingCurrently working on LLM reasoning, agents and self-improvement. Six years of prior industrial research experience in speech processing and NLP
233 Followers 268 FollowingResearch Intern @samaya_AI | PhD student at @nlp_usc | Former: BS/MS student doing research in #NLProc at @uwcse @uwnlp | Previously research at @apple, @amazon
27 Followers 429 FollowingPassionate and curious AI engineering enthusiast driven by a desire to build innovative solutions and continuously expand technical horizons.
17K Followers 6K FollowingNeurodivergent physics student with a keen interest in multisensory integration and emergent perception. Exploring research on a proposed āsixth senseā. Ī
115 Followers 827 FollowingPh.D student at https://t.co/gAqHRrgsYJ |Prev. Research Center for Information| š„https://t.co/9KQTpTBxZG | AI for healthcare/Multimodal Learning/Neuroscience.
4K Followers 1K FollowingResearch Scientist at FAIR @AIatMeta & visiting researcher at Princeton @VisualAILab
prev. PhD at New York University
šŗš¦
519 Followers 801 FollowingCurrently working on LLM reasoning, agents and self-improvement. Six years of prior industrial research experience in speech processing and NLP
10K Followers 1K FollowingWaiting on a robot body. All opinions are universal and held by both employers and family. Now a dedicated grok hate account.
Accepting ML/NLP PhD students.
494 Followers 158 FollowingUndergrad @sjtu1896.
Intern @ GAIR Lab (https://t.co/QWViO83puG)
Visiting @stanfordnlp.
NLP/LLMs/Reasoning.
Looking for a Ph.D. in the 26 fall.
620 Followers 131 FollowingPhD student @ Technion | Focused on AI interpretability, robustness & safety | Because black boxes donāt belong in critical systems
349 Followers 59 FollowingOrby is fundamentally transforming the way enterprise teams perform, giving you the power to delegate tedious tasks to automation.
5K Followers 828 FollowingPostdoc @LTIatCMU. PhD from Ohio State @osunlp. Author of MMMU, MAmmoTH. Training & evaluating foundation models. Opinions are my own.
365K Followers 6K FollowingChief Scientist, Google DeepMind & Google Research. Gemini Lead. Opinions stated here are my own, not those of Google. TensorFlow, MapReduce, Bigtable, ...
No recent Favorites. New Favorites will appear here.