• peterjansen_ai Profile Picture

    Peter Jansen ( @peterjansen-ai.bsky.social ) @peterjansen_ai

    a year ago

    Can language models be used as world simulators? In our ACL 2024 paper, we show -- not really. GPT-4 is only ~60% accurate at simulating state changes based on common-sense tasks, like boiling water. Preprint: arxiv.org/pdf/2406.06485 @allen_ai @MSFTResearch @aclmeeting

    peterjansen_ai tweet picture

    22 182 747 409K 587
    Download Image
  • peterjansen_ai Profile Picture

    Peter Jansen ( @peterjansen-ai.bsky.social ) @peterjansen_ai

    a year ago

    This is follow-on work to our paper that asks "Can LLMs generate code for world simulators?" Our EMNLP paper showed -- sort of. Our best technique generated runnable simulations 57% of the time, but half of what they allow doesn't make sense. Paper: arxiv.org/abs/2305.14879 /1 Following this paper, the big question was: Can we just use LLMs directly as simulators, without them generating code as an intermediate step? And if so, how good are they at this task?

    peterjansen_ai tweet picture

    1 2 32 5K 6
    Download Image
  • Teknium1 Profile Picture

    Teknium (e/λ) @Teknium1

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting Doesn't this show in all cases but the changes by the environment? that GPT4 is about as good on average as a human?

    Teknium1 tweet picture

    1 0 25 4K 4
    Download Image
  • Teknium1 Profile Picture

    Teknium (e/λ) @Teknium1

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting @teortaxesTex awaiting your review

    1 0 3 2K 1
  • rtk254 Profile Picture

    Ronen Tamari @rtk254

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting Cool work! We explored a different approach to using LLMs as world simulators in our Breakpoint Transformers paper x.com/ai2_aristo/sta… We similarly found that simulation abilities were not reliable, especially in OOD settings

    0 0 2 122 0
  • shaohua0116 Profile Picture

    Shao-Hua Sun @shaohua0116

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting Congrats on this inspiring work!

    0 0 0 185 0
  • BensenHsu Profile Picture

    BensenHsu @BensenHsu

    a year ago

    The findings suggest that while LLMs show promise, they are still unreliable as direct text-based world simulators, especially when it comes to capturing environment-driven transitions and transitions that require complex reasoning. The authors highlight the need for further innovations to improve LLM's world modeling capabilities. full paper: openread.academy/en/paper/readi…

    BensenHsu tweet picture

    1 2 32 4K 9
    Download Image
  • hirscheran Profile Picture

    Eran Hirsch @hirscheran

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting I really like this work that deeply dives into language models as world models. You might also find our work interesting, which provides more such analyses on classic planning domains: arxiv.org/pdf/2402.11489

    hirscheran tweet picture

    0 1 8 1K 5
    Download Image
  • SwampFox Profile Picture

    John Warner @SwampFox

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting Not surprising results from general-purpose LLMs. Can they be better world models when augmented by proprietary and domain-specific applications in an RAG method? One would expect so.

    0 0 1 312 0
  • RazoyoDev Profile Picture

    Razoyo @RazoyoDev

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting 60%... so, less than a toddler

    1 0 0 714 0
  • avipateldak Profile Picture

    Avinash @avipateldak

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting To me the question tells already tells alot what we have achieved

    0 0 0 495 0
  • SaquibOptimusAI Profile Picture

    Saquib Mehmood @SaquibOptimusAI

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting Of course not. Their Worldview is entirely based on training and alignment.

    0 0 0 470 0
  • Franco_Calabres Profile Picture

    Franco Calabrese @Franco_Calabres

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting Very nice. It would be interesting to see a human/vs/LLM performance comparison where the humans don't have the refined cognitive skills typical of high-level AI researchers.

    0 0 0 368 0
  • Mauricio_asz Profile Picture

    Mauricio @Mauricio_asz

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting @AkitaOnRails

    0 0 0 58 0
  • JonnyMadFox Profile Picture

    JonnyMadFox🏴 @JonnyMadFox

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting Still can't handle Nethack👏👏

    0 0 0 48 0
  • SinaShahandeh Profile Picture

    Sina Shahandeh @SinaShahandeh

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting This is tested on text-based GPT4 model. I am wondering if we would get a substantial improvements on GPT4-o given its multi-modality. The world simulators require observational knowledge not necessarily accessible in text format. Even if Qs are given in text.

    0 0 0 180 0
  • xwithwx Profile Picture

    xx @xwithwx

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting read this

    0 0 0 1 0
  • rando_137 Profile Picture

    randal @rando_137

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting Here’s another paper worth digging into around benchmarks. arxiv.org/pdf/2406.04744

    1 0 0 35 0
  • AiDeeply Profile Picture

    AI Deeply @AiDeeply

    a year ago

    @peterjansen_ai @allen_ai @MSFTResearch @aclmeeting "Sparks of AGI". (Not.) Snark aside: work like this is important for systematically demonstrating what many of us feel in everyday use.

    0 0 0 38 0
  • Download Image
    • Privacy
    • Term and Conditions
    • About
    • Contact Us
    • TwStalker is not affiliated with X™. All Rights Reserved. 2024 www.instalker.org

    twitter web viewer x profile viewer bayigram.com instagram takipçi satın al instagram takipçi hilesi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al sosyalgram takipçi satın al instagram ücretsiz takipçi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al metin2 metin2 wiki metin2 ep metin2 dragon coins metin2 forum metin2 board popigram instagram takipçi satın al takipçi hilesi twitter takipçi satın al tiktok takipçi satın al tiktok beğeni satın al tiktok izlenme satın al beğeni satın al instagram beğeni satın al youtube abone satın al youtube izlenme satın al buyfans buy instagram followers buy instagram likes buy instagram views buy tiktok followers buy tiktok likes buy tiktok views buy twitter followers buy telegram members Buy Youtube Subscribers Buy Youtube Views Buy Youtube Likes forstalk postegro web postegro x profile viewer