• raunakdoesdev Profile Picture

    Raunak Chowdhuri @raunakdoesdev

    2 years ago

    A recent work from @iddo claimed GPT4 can score 100% on MIT's EECS curriculum with the right prompting. My friends and I were excited to read the analysis behind such a feat, but after digging deeper, what we found left us surprised and disappointed. dub.sh/gptsucksatmit 🧵

    arankomatsuzaki Profile Picture

    Aran Komatsuzaki @arankomatsuzaki

    2 years ago

    A recent work from @iddo claimed GPT4 can score 100% on MIT's EECS curriculum with the right prompting. My friends and I were excited to read the analysis behind such a feat, but after digging deeper, what we found left us surprised and disappointed. dub.sh/gptsucksatmit 🧵

    raunakdoesdev tweet picture

    20 117 582 1.9M 361
    Download Image

    53 799 3K 3.5M 1K
  • raunakdoesdev Profile Picture

    Raunak Chowdhuri @raunakdoesdev

    2 years ago

    The released test set on Github is chockfull of impossible to solve problems. There are lots of questions referring to non-existent diagrams and missing contextual information. So how did GPT solve it? (1/4)

    raunakdoesdev tweet picture

    3 27 465 92K 11
    Download Image
  • katmhuang Profile Picture

    -webkat-huang @katmhuang

    2 years ago

    @sauhaarda @iddo great writeup!

    1 0 14 28K 0
  • canalCCore2 Profile Picture

    caio temer @canalCCore2

    2 years ago

    @sauhaarda @iddo If they were really good, they would open source at least version 3. There's something under the rug. I can't believe anyone who doesn't make an effort to be transparent. It just tries to shove something down our throats with an uneven and overpowering force.

    0 0 13 5K 1
  • OfficialLoganK Profile Picture

    Logan Kilpatrick @OfficialLoganK

    2 years ago

    @sauhaarda @iddo Super interesting analysis, more work like this is needed, thanks for the effort in this.

    0 0 10 4K 0
  • jwatte Profile Picture

    Very Human Robot @jwatte

    2 years ago

    @sauhaarda @iddo Thank you for calling this out! I keep saying that GPT-4 is very very good at sounding like a confident human, which is enough to fool 50% of the people 100% of the time. It's also very good at repeating something it has read before, although there, it may sometimes mis-quote.

    0 0 6 3K 0
  • acfou Profile Picture

    Dr. Fou - FouAnalytics @acfou

    2 years ago

    @sauhaarda @iddo students 1, prof 0 great investigative work

    0 0 0 537 0
  • amebagpt Profile Picture

    AmebaGPT @amebagpt

    2 years ago

    @sauhaarda @iddo Thank you for doing the digging, probably 90% of people who read the original headline won't see the criticism, but it's important to get out there. Validation using GPT-4 is just such a massive red flag, can't believe they thought it was ok

    0 0 21 5K 0
  • joelteply Profile Picture

    Joel Tepliid @joelteply

    2 years ago

    @sauhaarda @iddo That kind of academic malpractice usually ends a career. Do the same rules not apply to machine learning papers? Jesus Christ.

    1 1 18 3K 1
  • Reviewer2Ai Profile Picture

    lightyagami @Reviewer2Ai

    2 years ago

    @sauhaarda @mark_riedl @iddo Okay, so the fewshot setting is not good. But GPT-4 zeroshot got 90% right?

    4 0 19 33K 0
  • AlanVRK Profile Picture

    Alan Stacey @AlanVRK

    2 years ago

    @sauhaarda @iddo I'm completely unsurprised. It performs absolutely terribly at mathematical questions. I have plenty of examples (including GPT-4 which is not better) although not neatly organized in one place.

    AlanVRK Profile Picture

    Alan Stacey @AlanVRK

    3 years ago

    @sauhaarda @iddo I'm completely unsurprised. It performs absolutely terribly at mathematical questions. I have plenty of examples (including GPT-4 which is not better) although not neatly organized in one place.

    AlanVRK tweet picture

    2 1 0 2K 0
    Download Image

    2 1 5 2K 1
  • NLPurr Profile Picture

    NLPurr @NLPurr

    2 years ago

    @sauhaarda @iddo Sir, you do know you can choose to not be flower-nutria on notion? 😂

    1 0 7 16K 1
  • sparsh_17 Profile Picture

    Sparsh Agrawal @sparsh_17

    2 years ago

    @sauhaarda @iddo A lot of fuzzy evaluation has been happening especially in the open source domains, where they simply fine tune it with data consistent with evaluation benchmarks and then claiming wild results

    0 0 5 6K 1
  • sumerki8 Profile Picture

    Masha Liberman @sumerki8

    2 years ago

    @sauhaarda @iddo Does it mean the exams are not relevant for evaluation of human potential? :)

    2 0 2 6K 0
  • 69urbanmeyer Profile Picture

    No LeTrade Market @69urbanmeyer

    2 years ago

    @sauhaarda @iddo

    0 0 2 351 0
    Download Video
  • m_a_x_w_u Profile Picture

    Max @m_a_x_w_u

    2 years ago

    @sauhaarda @iddo Thanks so much for looking into it. I asked them how they prompted and got no response. My daughter uses Khanmigo which suppose to use GPT4. It got my daughter’s AOPS math problems wrong all the time, such as counting, probability and algebra.

    0 0 2 1K 0
  • LimayeVaibhavi Profile Picture

    Vaibhavi Limaye @LimayeVaibhavi

    2 years ago

    @sauhaarda @iddo But isn't CHAT GPT getting its answers from the internet. And if there are several answers to one question on the net....GPT is unable to find the right one. That is why... 60% score. Get back to your books guys.

    0 0 1 1K 0
  • timmyj1023 Profile Picture

    Tim Johnson @timmyj1023

    2 years ago

    @sauhaarda @iddo @SaveToNotion #thread

    1 0 1 6K 0
  • yo_ean Profile Picture

    Jan E @yo_ean

    2 years ago

    @sauhaarda @iddo @FrankWunderli13 FYI Re: MIT GPT paper

    0 0 1 404 0
  • niknoltex Profile Picture

    𝕏 NiNo @niknoltex

    2 years ago

    @sauhaarda @iddo @YannickScholich it’s all about the prompting 🔥

    0 0 1 922 0
  • Download Image
    • Privacy
    • Term and Conditions
    • About
    • Contact Us
    • TwStalker is not affiliated with X™. All Rights Reserved. 2024 www.instalker.org

    twitter web viewer x profile viewer bayigram.com instagram takipçi satın al instagram takipçi hilesi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al sosyalgram takipçi satın al instagram ücretsiz takipçi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al metin2 metin2 wiki metin2 ep metin2 dragon coins metin2 forum metin2 board popigram instagram takipçi satın al takipçi hilesi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al buyfans buy instagram followers buy instagram likes buy instagram views buy tiktok followers buy tiktok likes buy tiktok views buy twitter followers buy telegram members Buy Youtube Subscribers Buy Youtube Views Buy Youtube Likes forstalk postegro web postegro x profile viewer