A classic paper, collab between @AIatMeta , @GoogleDeepMind , and @NVIDIAAIDev
Language models keep personal facts in a measurable amount of “storage”. This study shows how to count that storage—and when models swap memorization for real learning.
📡 The Question
Can we…
The freshest must-read research papers for you:
▪️ Diffusion LMs Know the Answer Before Decoding
▪️ Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
▪️ StepWiser
▪️ ThinkDial
▪️ Provable Benefits of In-Tool Learning for LLMs
▪️ Understanding…
The best fine-tuning guide you'll find on arXiv this year.
Covers:
> NLP basics
> PEFT/LoRA/QLoRA techniques
> Mixture of Experts
> Seven-stage fine-tuning pipeline
Overview of Self-Evolving Agents
There is a huge interest in moving from hand-crafted agentic systems to lifelong, adaptive agentic ecosystems.
What's the progress, and where are things headed?
Let's find out:
Retrieval-Augmented Reasoning with Lean Language Models
Great paper showing how to fuse RAG and reasoning into a single small-footprint language model.
Distillation works if done correctly.
Very exciting results!
Here are my notes:
Tencent just dropped China's version of Google Genie 3!
Yan is an incredible world model that generates 1080p worlds at 60fps (!) with no game engine, pure AI inference, at 0.11s latency and infinite video length. It's trained on ~150 days of video gameplay.
The specs are…
Absolutely Golden resource: A Comprehensive Survey of Self-Evolving AI Agents
Self‑evolving agents are built to adapt themselves safely, not just run fixed scripts, guided by 3 laws, endure, excel, evolve.
The survey maps a 4‑stage shift,
MOP (Model Offline Pretraining) to…
Turn PDF files into clean, LLM-ready data!
Dolphin is an open source document parsing framework that converts PDFs into structured formats like Markdown, HTML, LaTeX, and JSON.
100% Open Source
Is Chain-of-Thought Reasoning of LLMs a Mirage?
... Our results reveal that CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing…
Hierarchical Reasoning Model
This is one of the most interesting ideas on reasoning I've read in the past couple of months.
It uses a recurrent architecture for impressive hierarchical reasoning.
Here are my notes:
New Anthropic research: Persona vectors.
Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.
Log-linear attention — a new type of attention proposed by @MIT which is:
- fast and efficient as linear attention
- expressive as softmax
It uses a small but growing number of memory slots that increases logarithmically with the sequence length.
Here's how it works:
pku-epic.github.io/GraspVLA-web/
sim of grasp is ready for visual rendering
(also almost ready for physics, based on other recent works on dynamics simulation)
Introducing Meta Perception Language Model (PLM): an open & reproducible vision-language model tackling challenging visual tasks.
Learn more about how PLM can help the open source community build more capable computer vision systems.
Read the research paper, and download the…
12K Followers 3K Followingresearch @MIT_CSAIL @thinkymachines. work on scalable and principled algorithms in #LLM and #MLSys. in open-sourcing I trust 🐳. she/her/hers
187K Followers 105 FollowingWe're sharing/showcasing best of @github projects/repos. Follow to stay in loop. Promoting Open-Source Contributions. UNOFFICIAL, but followed by github
261K Followers 3K FollowingI wrote a book! 🥳Not a Gumroad eBook scam but a real book! 📕
https://t.co/ttVHSuusdr
Buy the hardcover, now on Amazon!📚
https://t.co/VqLLzaE5Ds
101K Followers 28 FollowingBuild AI agents over your documents
Github: https://t.co/HC19j7vMwc
Docs: https://t.co/QInqg2zksh
LlamaCloud: https://t.co/yQGTiRSNvj
2K Followers 266 FollowingComputer Vision Research Scientist at @simulon, music lover , fond of scientific/musical/geeky/useless stuff. I'm posting papers on whatever I found amazing :)
18K Followers 366 FollowingThe top education and research institution in the 🌎 for #AI and #machinelearning | Research
→ https://t.co/jUD0hZ8SFx | Learn more ↓
9K Followers 586 FollowingAmplifying human ability through advances in Energy and Materials, Human-Centered AI, Human Interactive Driving, Machine Learning, and Robotics
21K Followers 267 FollowingPioneering the future of robotics since 1979. We’re transforming industries and everyday life through cutting-edge innovation and world-class education.
28K Followers 1K FollowingResearch at @GoogleDeepMind. Controllable World Simulators (GNNs, Structured World Models, Neural Assets). Veo Team (Ingredients to Video Co-Lead)
16K Followers 708 FollowingML Engineer @ML6team, part-time at @huggingface. @KU_Leuven grad. General interest in machine learning, deep learning. Making AI more accessible for everyone!
108K Followers 4 FollowingCohere builds secure, scalable, and private enterprise-grade AI solutions for real-world business problems. Join us: https://t.co/Yb2xItMObl
5K Followers 1K Followingbuilding the post-IDE IDE at https://t.co/hDpglja33W - coined “context engineering”, prev @replicatedhq @SproutSocial - ai that works pod @ https://t.co/69BhaNtWfd
1.3M Followers 1K FollowingCo-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain. #ai #machinelearning, #deeplearning #MOOCs
4K Followers 119 FollowingArize AI is an AI observability and LLM evaluation platform that helps teams deliver and maintain more successful AI in production.
1.4M Followers 0 FollowingA universe of atoms, an atom in the universe. Tribute to the great explainer. Tweets about Science and Wisdom. Portrait by L.V Patten.
No recent Favorites. New Favorites will appear here.