Raunak Chowdhuri @raunakdoesdev, Twitter Profile

Raunak Chowdhuri @raunakdoesdev

2 years ago

A recent work from @iddo claimed GPT4 can score 100% on MIT's EECS curriculum with the right prompting. My friends and I were excited to read the analysis behind such a feat, but after digging deeper, what we found left us surprised and disappointed. dub.sh/gptsucksatmit 🧵

Aran Komatsuzaki @arankomatsuzaki

2 years ago

20 117 582 1.9M 361

Download Image

53 799 3K 3.5M 1K

Raunak Chowdhuri @raunakdoesdev

2 years ago

The released test set on Github is chockfull of impossible to solve problems. There are lots of questions referring to non-existent diagrams and missing contextual information. So how did GPT solve it? (1/4)

3 27 465 92K 11

Download Image

-webkat-huang @katmhuang

2 years ago

@sauhaarda @iddo great writeup!

1 0 14 28K 0

caio temer @canalCCore2

2 years ago

@sauhaarda @iddo If they were really good, they would open source at least version 3. There's something under the rug. I can't believe anyone who doesn't make an effort to be transparent. It just tries to shove something down our throats with an uneven and overpowering force.

0 0 13 5K 1

Logan Kilpatrick @OfficialLoganK

2 years ago

@sauhaarda @iddo Super interesting analysis, more work like this is needed, thanks for the effort in this.

0 0 10 4K 0

Very Human Robot @jwatte

2 years ago

@sauhaarda @iddo Thank you for calling this out! I keep saying that GPT-4 is very very good at sounding like a confident human, which is enough to fool 50% of the people 100% of the time. It's also very good at repeating something it has read before, although there, it may sometimes mis-quote.

0 0 6 3K 0

Dr. Fou - FouAnalytics @acfou

2 years ago

@sauhaarda @iddo students 1, prof 0 great investigative work

0 0 0 537 0

AmebaGPT @amebagpt

2 years ago

@sauhaarda @iddo Thank you for doing the digging, probably 90% of people who read the original headline won't see the criticism, but it's important to get out there. Validation using GPT-4 is just such a massive red flag, can't believe they thought it was ok

0 0 21 5K 0

Joel Tepliid @joelteply

2 years ago

@sauhaarda @iddo That kind of academic malpractice usually ends a career. Do the same rules not apply to machine learning papers? Jesus Christ.

1 1 18 3K 1

lightyagami @Reviewer2Ai

2 years ago

@sauhaarda @mark_riedl @iddo Okay, so the fewshot setting is not good. But GPT-4 zeroshot got 90% right?

4 0 19 33K 0

Alan Stacey @AlanVRK

2 years ago

@sauhaarda @iddo I'm completely unsurprised. It performs absolutely terribly at mathematical questions. I have plenty of examples (including GPT-4 which is not better) although not neatly organized in one place.

Alan Stacey @AlanVRK

3 years ago

2 1 0 2K 0

Download Image

2 1 5 2K 1

NLPurr @NLPurr

2 years ago

@sauhaarda @iddo Sir, you do know you can choose to not be flower-nutria on notion? 😂

1 0 7 16K 1

Sparsh Agrawal @sparsh_17

2 years ago

@sauhaarda @iddo A lot of fuzzy evaluation has been happening especially in the open source domains, where they simply fine tune it with data consistent with evaluation benchmarks and then claiming wild results

0 0 5 6K 1

Masha Liberman @sumerki8

2 years ago

@sauhaarda @iddo Does it mean the exams are not relevant for evaluation of human potential? :)

2 0 2 6K 0

No LeTrade Market @69urbanmeyer

2 years ago

@sauhaarda @iddo

0 0 2 351 0

Download Video

Max @m_a_x_w_u

2 years ago

@sauhaarda @iddo Thanks so much for looking into it. I asked them how they prompted and got no response. My daughter uses Khanmigo which suppose to use GPT4. It got my daughter’s AOPS math problems wrong all the time, such as counting, probability and algebra.

0 0 2 1K 0

Vaibhavi Limaye @LimayeVaibhavi

2 years ago

@sauhaarda @iddo But isn't CHAT GPT getting its answers from the internet. And if there are several answers to one question on the net....GPT is unable to find the right one. That is why... 60% score. Get back to your books guys.

0 0 1 1K 0

Tim Johnson @timmyj1023

2 years ago

@sauhaarda @iddo @SaveToNotion #thread

1 0 1 6K 0

Jan E @yo_ean

2 years ago

@sauhaarda @iddo @FrankWunderli13 FYI Re: MIT GPT paper

0 0 1 404 0

𝕏 NiNo @niknoltex

2 years ago

@sauhaarda @iddo @YannickScholich it’s all about the prompting 🔥

0 0 1 922 0