š NuRL: Pushing LLM Reasoning to the Next Level!
š” Hard problems? No problem! GRPO struggles with 0% pass-rate tasks, but NuRL nudges LLMs with self-generated hints ā expands the learning zone.
š Results across 6 benchmarks & 3 models:
+0.8ā1.8% over GRPO on pass@1
š NuRL: Pushing LLM Reasoning to the Next Level!
š” Hard problems? No problem! GRPO struggles with 0% pass-rate tasks, but NuRL nudges LLMs with self-generated hints ā expands the learning zone.
š Results across 6 benchmarks & 3 models:
+0.8ā1.8% over GRPO on pass@1
šØ New paper!
š MENLO: From Preferences to Proficiency
We introduce a framework + dataset for evaluating and modeling native-like LLM response quality across 47 languages, inspired by audience design principles.
š Paper: arxiv.org/abs/2509.26601
š¤ Data: huggingface.co/datasets/facebā¦ā¦
Bringing new meaning to the word NeRD at #CoRL2025. š¤
Neural Robot Dynamics (NeRD) bridges the sim-to-real gap with a hybrid of neural modeling and analytical simulation, unlocking zero-shot transfer and continuous learning loops for optimal robotics training.
Read more &ā¦
What if a robot could reason through failures like "the board isn't clean... I need more force"āand fix it autonomously? Excited to share "Tactile-VLA: unlocking VLMs' latent physical knowledge for contact-rich manipulation through tactile sensing" in jialeihuang.github.io/tactileVLA.gitā¦
Don't sleep tonightšØ: The great Andromeda galaxy will be visible to the naked eye tonight. This will be an extremely rare sight as the trillion star galaxy rises to its highest point in the night sky
BREAKING: Scientists reveal the interstellar object 3I/ATLAS, a Manhattan-sized comet suspected of possibly carrying alien technology, is far larger than originally believed.
why is no one talking about this? schmidt: china's AI strategy is not pursuing "crazy" AGI strategies like america, but applying AI to everyday things. this is because their capital markets lack depth, and compute is blocked. he warns this plus lack of open source are majorā¦
« So while you have a chance, take a look up and enjoy the view.
Our ancestors could hardly have imagined that one day, one of the brightest objects in the night sky would have been conceived by the human mind and built by human hands.Ā Ā» earthsky.org/space/internatā¦
š ARE: scaling up agent environments and evaluations
Everyone talks about RL envs so we built one we actually use. In the second half of AI, evals & envs are the bottleneck.
Today we OSS it all: Meta Agent Research Environment + GAIA-2 (code, demo, evals).
šLinksš
That's right, we released our first iteration of JEPAs for LLMs: arxiv.org/abs/2509.14252ā¦
And yes, the code is public! Learning by latent space prediction has revolutionized vision models, and it will revolutionize LLMs!
šOne small step for JEPAs, one giant leap for LLMsš
That's right, we released our first iteration of JEPAs for LLMs: arxiv.org/abs/2509.14252ā¦
And yes, the code is public! Learning by latent space prediction has revolutionized vision models, and it will revolutionize LLMs!
šOne small step for JEPAs, one giant leap for LLMsš
!! Important new for anyone using `transformers` from main!!
We just cut the v4 branch: we will daily cherry-pick relevant commits from main (like new models).
But main of `transformers` will start getting the v5 commits! The first and biggest one yet: github.com/huggingface/trā¦
!! Important new for anyone using `transformers` from main!!
We just cut the v4 branch: we will daily cherry-pick relevant commits from main (like new models).
But main of `transformers` will start getting the v5 commits! The first and biggest one yet: github.com/huggingface/trā¦
I got to try the @RealityLabs@ray_ban display ai glasses and they were truly impressive. Definitely the ipod moment for ai wearables. The wristband worked like magic and zooming in on the camera felt like a superpower
Serving a model at scale is hard. Serving it across three hardware platforms (AWS Trainium, NVIDIA GPUs, Google TPUs) while maintaining strict equivalence is a whole other level.
Makes you wonder if the hardware flexibility is truly worth the hit to development speed andā¦
331 Followers 3K Following70% šāāļøš„/ 30%šš„-or 27%š„for 7 days+PSP fees. ā¬0.50 per install in šŖšŗ?Nameās Tim or Sundar? Swing by the DoJ.
258K Followers 216K FollowingGerman #geographer and #demographer in #Melbourne. I curate #maps and #data that explain how the #world works. Obviously all opinions are my own...
10K Followers 287 FollowingNot a trader, I swear. Just obsessed with capturing lowest cost basis on falling knives, which entails lots of buying and selling. NOT INVESTMENT ADVICE.
65K Followers 476 FollowingSpeculator & Investor since 2013 | Former Engineer & VC | Future Optimist | Here to help you make better investment decisions
43K Followers 2K FollowingCovering Congress and the defense industry for @breakingdefense . Military plane meme aficionado. Mother of @ImpalerCat. She/her
2K Followers 2K FollowingDirector, Government & International Affairs, @Canadensys1 // Former Canadian Space Agency DG Space Exploration // Still exploring
439 Followers 563 FollowingClient Support at hightechgadgets
Level 2 Seller at https://t.co/iproGcpC04ā¦,
PhD Fellow, Good at using #Ansys, #Excel, #Matlab, for #Research
5K Followers 6K FollowingCur. - Security Cooperation Professional for āāāā
"dude in his mom's basement talking to my son over video games" -@coleman_di92842
47K Followers 1K FollowingAI Developer Experience @GoogleDeepMind | prev: Tech Lead at @huggingface, AWS ML Hero š¤ Sharing my own views and AI News š§š»āš» https://t.co/7IosdlNz22
495 Followers 96 FollowingAs an #Space, #Science, and #Tech enthusiast, Iām lucky to live in an era nearing the singularity, with #AI breakthroughs and #MultiPlanetary possibilities.
24K Followers 3K FollowingIn the beginning Bill Clinton gave him a green card. This has made a lot of people very angry and been widely regarded as a bad move ⢠@twocentinc
3K Followers 436 FollowingāThose who seek for truth behind the phenomena are condemned to an expedition in search of nothingnessāthe phenomena themselves are the living!ā
5K Followers 4K FollowingTheatre Director & Designer w/ a bkgrd in film, photo & tech. Live streaming geek & video producer for @NASASpaceflight. Coffee addict. BFA/MA/MFA. He/Him.
57K Followers 860 FollowingFiguring out AI @allen_ai, open models, RLHF, fine-tuning, etc
Contact via email.
Writes @interconnectsai
Wrote The RLHF Book
Mountain runner
649K Followers 35 FollowingWe're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.