NeetCode @neetcode1, Twitter Profile

NeetCode @neetcode1

2 months ago

At this point, how much do people actually care about benchmarks? Calling it now, Grok 4 won't actually be the best model, it's just the classic hype cycle. Starting to see a lot more people catch on.

31 22 693 42K 28

Art Vandelay @ArtTheVandelay

2 months ago

@neetcode1 X Monetization is a hell of a drug.

2 0 6 547 0

Download Image

dare @dariusemrani

2 months ago

@neetcode1 Most benchmarks are BS. I’m only excited by the ARC score.

dare @dariusemrani

3 months ago

@neetcode1 Most benchmarks are BS. I’m only excited by the ARC score.

2 1 9 10K 15

0 0 4 2K 0

Shreyans Bhansali @makersfuel

2 months ago

@neetcode1 Benchmarks are the trailer. Real-world use is the movie.

1 0 1 470 0

maverick @vaggelis98_

2 months ago

@neetcode1 It is just like IQ tests. If you start solving 10,20,30 then you arent testing your IQ, you are optimizing your reasoning for IQ tests and therefore the result starts becoming less and less trustworthy

0 0 1 80 0

Dhananjay Kajla @kajla_dhananjay

2 months ago

@neetcode1 x.com/kajla_dhananja…

0 0 0 159 0

Vladimir Tchuiev @VTchuiev

2 months ago

@neetcode1 Are people catching on? Yesterday this entire feed was filled with Grok 4 hype... As expected, it's not AGI but more of a bloated mess

0 0 0 43 0

Bread @BreadPirateRob

2 months ago

@neetcode1 model providers don't disclose quantization level and regularly change it so, the model you actually get from ChatGPT, Claude, Grok, etc. isn't the model that is pegged to the benchmark

0 0 0 256 0

Eric Roby @codingwithroby

2 months ago

@neetcode1 for sure

0 0 0 255 0

Suraj Gupta @SGRamesh23

2 months ago

@neetcode1 It's true, sometimes the hype can overshadow the actual performance.

0 0 0 19 0

Dino 🇦🇷 @dinocres1

2 months ago

@neetcode1 I ignore both benchmarks and initial Twitter over/under hype. Useful models surface to the top organically

0 0 0 291 0

Johannes Tscharn @JohannesTscharn

2 months ago

@neetcode1 Jeah it’s sad they mostly showed benchmark results and graphs that most people neither really understand nor care about…

0 0 0 1K 0

Hatem @KaousNadirHatem

2 months ago

@neetcode1 I think benchmarks still matter, just not to end-users. They're crucial for the developers and businesses who have to choose which foundational model to build on top of.

0 0 0 118 0

nick @thecsguy

2 months ago

@neetcode1 no one should. they are meaningless numbers

0 0 0 12 0

Ruthuvikas Ravikumar @ruthuvikas

2 months ago

@neetcode1 LM's are getting saturated. These hype claims are just to maintain the stock value.

0 0 9 1K 0

Aditya @fate1ess

2 months ago

@neetcode1 I refuse to believe it's actually smarter than claude at coding.

1 0 4 758 0

AKS 🇮🇳 @thoughtsofayush

2 months ago

@neetcode1 By that logic, solving leetcode questions as a ‘benchmark’ should not be someones full time job.

0 0 3 257 0

Asesh @algorithmsarm64

2 months ago

@neetcode1 They started late and is leading now. Things will only get better. Never bet against Elon!!

0 0 2 322 0

another boring guy @gemb0_0

2 months ago

@neetcode1 With every new Gemini release it tops the charts but after trying it, it doesn't feel like that's the best model at all so for me the benchmarks are just a hype machine to keep the party going

0 0 1 356 0

k @rfxkairu

2 months ago

@neetcode1 many such cases. grok 3 was hyped and leading some benchmarks, just for no one caring other than musk bootlickers. time and time again, anthropic and openai get dethroned in benchmarks just for them to still have the best models when it comes to actually using them

0 0 1 370 0

we don't know @userinfo25

2 months ago

@neetcode1 You're dumb bro, you also said ai was all hype in the starting. We all know how better and useful it is now

1 0 1 108 0

rufw @rufw91

2 months ago

@neetcode1 Lol. Spoken like a true hater

1 0 1 320 0

Lite @lite_745

2 months ago

@neetcode1 Which one is the best for free APIs ?

0 0 0 8 0

Henry Sibanda @sibanda_henry

2 months ago

@neetcode1 Everyone caught on ages ago when openai overdid it with strawberry. People now just test out of curiosity but the excitement has dropped. A tweet of a just a 🍓used to get people crazy 😂

0 0 0 15 0

Amit @actual_amit

2 months ago

@neetcode1 Yeah Over promise and under deliver

0 0 0 437 0

zenitsu_apprentice @zenitsu_aprntc

2 months ago

@neetcode1 do you think we'll see like another checkpoint, like the reasoning models, where the models get a serious update/change

0 0 0 517 0

Derek Johnson @djcarday

2 months ago

@neetcode1 whyyy so much AI talk? AI bro now :(

0 0 0 145 0

Aditya @codeStumps

2 months ago

@neetcode1 Some benchmarks are manufactured by authors themselves. We ll soon know about this. Hope it’s not like the llama-gate🤷🏻‍♂️

0 0 0 189 0

Harmit @harmitwt

2 months ago

@neetcode1 Fully agree!! My tweet after I watched yesterday’s presentation

1 0 0 16 0

Download Image

Ibn Haleema al Kashmiri @ibn_haleema

2 months ago

@neetcode1 Yup, for me Claude and deepseek are better code writers than Grok.

0 0 0 25 0

Rayan Krishnan @RayanKrishnan

2 months ago

You called it, didn't do as well on held-out benchmarks. Its about having he right benchmarks, not just self-reported performance. x.com/_valsai/status…

Vals AI @_valsai

2 months ago

You called it, didn't do as well on held-out benchmarks. Its about having he right benchmarks, not just self-reported performance. x.com/_valsai/status…

2 2 10 1K 1

Download Image

0 0 2 48 0