v̴̝̐i̖̍x̘̍t̵̙̖̆̅ @_rdm_8

I retweet content I think is important. 😊 Singapore Joined May 2014

Tweets

772
Followers

27
Following

105
Likes

1K

Thinking Machines @thinkymachines

7 days ago

LoRA makes fine-tuning more accessible, but it's unclear how it compares to full fine-tuning. We find that the performance often matches closely---more often than you might expect. In our latest Connectionism post, we share our experimental results and recommendations for LoRA.…

79 543 3K 1.2M 2K

Download Image

Ali Hatamizadeh @ahatamiz1

6 days ago

Are you ready for web-scale pre-training with RL ? 🚀 🔥 New paper: RLP : Reinforcement Learning Pre‑training We flip the usual recipe for reasoning LLMs: instead of saving RL for post‑training, we bring exploration into pretraining. Core idea: treat chain‑of‑thought as an…

22 111 694 93K 584

Download Image

Simo Ryu @cloneofsimo

2 weeks ago

Damn, very interesting paper. after rapid loss reduction, we see deceleration and follow "scaling law": this is because at these steps, gradients start to conflict each other. Updates are 'fightining for modal capacity' in some sense, and larger the model less fighting there…

12 63 548 45K 446

Download Image

Tim Dettmers @Tim_Dettmers

2 weeks ago

Looking closer, PyTorch also uses FP32, but here's the real reason why bnb Adam is better: we optimized for float numerics, order does matter! Computing sqrt(v) + eps*c2 then dividing avoids amplifying errors vs PyTorch's sqrt(v)/c2 + eps. Same math, better stability!

Tim Dettmers @Tim_Dettmers

2 weeks ago

8 7 165 52K 52

5 21 390 42K 205

Hynek Kydlíček @HKydlicek

4 weeks ago

We are releasing 📄 FinePDFs: the largest PDF dataset spanning over half a billion documents! - Long context: Documents are 2x longer than web text - 3T tokens from high-demand domains like legal and science. - Heavily improves over SoTA when mixed with FW-EDU&DCLM web copora.

24 120 716 191K 417

Download Image

Rohan Paul @rohanpaul_ai

a month ago

jax-ml.github.io/scaling-book/g…

0 10 92 9K 171

edwin @edwinarbus

2 months ago

You can learn more about it here: cookbook.openai.com/examples/gpt-5…

1 18 160 13K 150

v̴̝̐i̖̍x̘̍t̵̙̖̆̅ @_rdm_8

2 months ago

GPT-5 Thinking is incredible! I asked algo interview questions that are asked to SSEs. these are not available on the internet, made up by adding more constraints or twisting familiar scenarios. More than solving the questions, the reasoning it shows gives me goose bumps!

1 0 1 55 0

v̴̝̐i̖̍x̘̍t̵̙̖̆̅ @_rdm_8

2 months ago

What it's like to Vibe Code youtube.com/shorts/ql56K3s…

0 0 0 15 0

Google DeepMind @GoogleDeepMind

2 months ago

What if you could not only watch a generated video, but explore it too? 🌐 Genie 3 is our groundbreaking world model that creates interactive, playable environments from a single text prompt. From photorealistic landscapes to fantasy realms, the possibilities are endless. 🧵

841 3K 14K 3.6M 4K

Download Video

Eric Wallace @Eric_Wallace_

2 months ago

In releasing this paper and model, we hope that it can aid safety research and serve as useful guidance for other groups looking to release open-weight models. Paper: cdn.openai.com/pdf/231bf018-6… w/ @OliviaGWatkins2 @MilesKWang @kaicathyc @chrisk99999 and many others!

3 8 114 15K 23

Anthropic @AnthropicAI

2 months ago

New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.

231 938 6K 1.4M 4K

Download Image

Tooliense @tooliense

3 months ago

Now you can just use an agent than can solve olympiad level problems with completely FREE. Also this intelligence can be utilized at coding, science, ...etc any domain you want. We just opensourced our agent system Crux. We don't require you subscribe or any payments. Just…

2 2 5 308 2

Download Image

Owain Evans @OwainEvans_UK

3 months ago

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

292 1K 9K 1.9M 5K

Download Image

Bowen Baker @bobabowen

3 months ago

Modern reasoning models think in plain English. Monitoring their thoughts could be a powerful, yet fragile, tool for overseeing future AI systems. I and researchers across many organizations think we should work to evaluate, preserve, and even improve CoT monitorability.

56 160 832 739K 537

Download Image

Kimi.ai @Kimi_Moonshot

3 months ago

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence…

282 1K 7K 2.6M 3K

Download Image

Buitengebieden @buitengebieden

3 months ago

Sharing is caring.. 😊

237 3K 24K 1.2M 838

Download Video

Loubna Ben Allal @ COLM @LoubnaBenAllal1

3 months ago

Introducing SmolLM3: a strong, smol reasoner! > SoTA 3B model > dual mode reasoning (think/no_think) > long context, up to 128k > multilingual: en, fr, es, de, it, pt > fully open source (data, code, recipes) huggingface.co/blog/smollm3