JAYANTH @_jayanth_mohan_

AI undergrad | Researcher Joined December 2016

Tweets

252
Followers

9
Following

248
Likes

75

TuringPost @TheTuringPost

5 months ago

4 advanced attention mechanisms you should know: • Slim attention — 8× less memory, 5× faster generation by storing only K from KV pairs and recomputing V. • XAttention — 13.5× speedup on long sequences via "looking" at the sum of values along diagonal lines in the attention…

7 183 922 56K 793

Download Image

ℏεsam @Hesamation

5 months ago

the best researchers from Meta, Yale, Stanford, Google DeepMind, and Microsoft laid out all we know about Agents in a 264-page paper [book], here are some of their key findings:

93 1K 9K 1.3M 14K

Download Image

Sebastian Raschka @rasbt

5 months ago

Coded Llama 3.2 model from scratch and shared it on the HF Hub. Why? I think 1B & 3B models are great for experimentation, and I wanted to share a clean, readable implementation for learning & research: huggingface.co/rasbt/llama-3.…

31 281 2K 96K 1K

Download Image

Bindu Reddy @bindureddy

9 months ago

QwQ is fantastic reasoner and is 10x cheaper than the o1 line We will be combining with o1-mini and o1-preview as part of our route LLM AGI will be an ensemble system that combines the best LLMs to maximize performance, speed and cost

30 36 253 26K 104

Download Video

Sumanth @Sumanth_077

10 months ago

Stanford CS229: Building Large Language Models This 1.5 hours lecture provides a concise overview of building a ChatGPT-like model, covering both pretraining (language modeling) and post-training (SFT/RLHF). youtu.be/9vM4p9NN0Ts?si…

4 311 2K 112K 2K

Download Image

Yam Peleg @Yampeleg

10 months ago

E-V-E-R-Y-T-H-I-N-G is open source 🔥🔥

2 1 6 935 3

Download Image

Yujin Tang @yujintang99

10 months ago

Fantastic Survey! Autoregressive Models in Vision.

Jinfa Huang @vhjf36495872

10 months ago

Fantastic Survey! Autoregressive Models in Vision.

0 8 21 3K 11

Download Image

1 2 17 2K 6

Alex Cheema - e/acc @alexocheema

10 months ago

M4 Mac AI Coding Cluster Uses @exolabs to run LLMs (here Qwen 2.5 Coder 32B at 18 tok/sec) distributed across 4 M4 Mac Minis (Thunderbolt 5 80Gbps) and a MacBook Pro M4 Max. Local alternative to @cursor_ai (benchmark comparison soon).

109 364 4K 513K 2K

Download Video

The AI Timeline @TheAITimeline

10 months ago

🚨This week’s top AI/ML research papers: - Mixture-of-Transformers - BitNet a4.8 - LoRA vs Full Fine-tuning: An Illusion of Equivalence - Mixtures of In-Context Learners - Emergence of Hidden Capabilities - DimensionX - The Surprising Effectiveness of Test-Time Training for…

5 121 986 104K 899

Download Image

Rohan Paul @rohanpaul_ai

10 months ago

Nice collection of LLM papers, blogs, and projects, focussing on OpenAI o1 and reasoning techniques. What it offers: 📌 Curates papers, blogs, talks, and Twitter discussions about OpenAI's o1 and LLM reasoning 📌 Tracks frontier developments in LLM reasoning capabilities and…

3 105 468 30K 494

Download Image

Sebastian Raschka @rasbt

10 months ago

If you are looking for something to read/study this weekend, I added lots of LLM-related bonus from-scratch coding resources over the last few months (from implementing Llama 3.2 to preference tuning with DPO): github.com/rasbt/LLMs-fro… I hope you find them useful!

29 385 2K 102K 2K

Download Image

Akshay 🚀 @akshay_pachaar

11 months ago

Microsoft just changed the game! 🔥 They've open-sourced bitnet.cpp: a blazing-fast 1-bit LLM inference framework that runs directly on CPUs. Why is this a game-changer❓ You can now run 100B parameter models on local devices with up to 6x speed improvements and 82% less…

165 1K 7K 787K 8K

Download Video

Yam Peleg @Yampeleg

11 months ago

You see the length of this prompt? This is what you should have in your instruct dataset if you want to compete with the big players.

Vaibhav (VB) Srivastav @reach_vb

11 months ago

You see the length of this prompt? This is what you should have in your instruct dataset if you want to compete with the big players. https://t.co/bUSThgrRf3

20 139 1K 979K 2K

Download Image

47 316 4K 869K 8K

Download Image

Andrej Karpathy @karpathy

a year ago

The model card has some more interesting info too: github.com/meta-llama/lla… Note that Llama 3 8B is actually somewhere in the territory of Llama 2 70B, depending on where you look. This might seem confusing at first but note that the former was trained for 15T tokens, while the…

31 108 1K 197K 389

Hugh Zhang @hughbzhang

a year ago

Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.

35 223 1K 291K 381

Download Image

Andrej Karpathy @karpathy

a year ago

Nice new read on tokenization! You've heard about the SolidGoldMagikarp token, which breaks GPT-2 because it was present in the training set of the Tokenizer, but not the LLM later. This paper digs in in a lot more depth and detail, on a lot more models, discovering a less…

Sander Land @magikarp_tokens

a year ago

16 154 894 555K 710

Download Image

48 349 3K 533K 2K

Yam Peleg @Yampeleg

a year ago

Big: First BitNet reproduction shows consistent results!

Nous Research @NousResearch

a year ago

Big: First BitNet reproduction shows consistent results!

24 155 918 227K 340

Download Image

1 2 37 5K 5

Rohan Paul @rohanpaul_ai

a year ago

llmlingua - This great lib from Microsoft can 𝗰𝗼𝗺𝗽𝗿𝗲𝘀𝘀 your prompt massively. 📌 Up to 20% of the prompt's original length (5x reduction), leading to massively reduced cost and latency. 🔥 speed up LLMs' inference and enhance LLM's perceive of key information, compress…

2 38 160 8K 113

Download Image

Lior⚡ @LiorOnAI

a year ago

This sets the ground for AGI. Sakana AI just released a new method to combine the 500,000 open-source models to build new ones. Evolutionary Model Merge uses evolutionary techniques to automatically create new foundation models with the desired capabilities. "We find that our…

23 136 522 95K 421

Download Video

Parul Pandey @pandeyparul

2 years ago

Thank you, @Thom_Wolf for sharing your slides from the recent lecture at ELLIS Winter School. Despite its modest title, "A Little Guide to Building Large Language Models in 2024," the presentation is anything but 'little'—offering a deep dive into the intricacies of the workflow…