Shijie Chen @ShijieChen98

PhD student @osunlp Ohio, USA Joined April 2018

Tweets

69
Followers

217
Following

217
Likes

34

Yu Su (hiring postdoc) @ysu_nlp

6 days ago

Computer Use: Modern Moravec's Paradox A new blog post arguing why computer-use agents may be the biggest opportunity and challenge for AGI. tinyurl.com/computer-use-a… Table of Contents > Moravec’s Paradox > Moravec's Paradox in 2025 > Computer use may be the biggest opportunity…

9 62 184 36K 98

Download Image

Boyuan Zheng@ICML @boyuan__zheng

a month ago

Remember “Son of Anton” from the Silicon Valley show(@SiliconHBO)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code? It’s starting to look a lot like reality. Even 18 months ago, my own…

Scale AI @scale_AI

a month ago

7 22 87 30K 19

0 28 67 7K 18

Download Image

Yu Su (hiring postdoc) @ysu_nlp

2 months ago

🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️ Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge - 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor -…

3 50 221 40K 133

Download Image

Yifei Li @YifeiLiPKU

3 months ago

📢 Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale! We use AutoSDT to collect AutoSDT-5K, enabling open co-scientist models that rival GPT-4o on ScienceAgentBench! Thread below ⬇️ (1/n)

4 25 71 11K 21

Download Image

Yu Su (hiring postdoc) @ysu_nlp

3 months ago

📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵

5 57 268 22K 153

Download Image

Botao Yu @BotaoYu24

3 months ago

🔬 Introducing ChemMCP, the first MCP-compatible toolkit for empowering AI models with advanced chemistry capabilities! In recent years, we’ve seen rising interest in tool-using AI agents across domains. Particularly in scientific domains like chemistry, LLMs alone still fall…

3 30 66 9K 19

Download Video

Shijie Chen @ShijieChen98

3 months ago

Checkout InsightAgent (ACL'25 main), our latest work on accelerating systematic reviews from taking months to just hours with interactive AI agents! While full automation is handy, human expertise is still a must in many high-stake domains. Different from the regular…

Rui Qiu @RuiQiu18

3 months ago

1 16 22 4K 5

Download Image

0 1 2 398 0

Zeyi Liao @LiaoZeyi

3 months ago

⁉️Can you really trust Computer-Use Agents (CUAs) to control your computer⁉️ Not yet, @AnthropicAI Opus 4 shows an alarming 48% Attack Success Rate against realistic internet injection❗️ Introducing RedTeamCUA: realistic, interactive, and controlled sandbox environments for…

3 33 82 23K 24

Download Video

Boyuan Zheng@ICML @boyuan__zheng

5 months ago

🔧What if your web agent could abstract its experience into programmatic skills—and improve itself autonomously? 🌟 Introducing SkillWeaver: a framework to enable self-improvement through autonomous exploration and constructing an ever-growing library of programmatic skills. 🧠…

4 25 67 12K 33

Download Video

Boshi Wang @BoshiWang2

5 months ago

LLMs exhibit the Reversal Curse, a basic generalization failure where they struggle to learn reversible factual associations (e.g., "A is B" -> "B is A"). But why? Our new work uncovers that it's a symptom of the long-standing binding problem in AI, and shows that a model design…

25 128 868 133K 895

Download Image

Boyuan Zheng@ICML @boyuan__zheng

5 months ago

🚀 Excited to co-organize the Workshop on Computer Use Agents (CUA) at #ICML2025 in Vancouver! This workshop takes a comprehensive look at computer use agents—covering learning algorithms, orchestration, interfaces, safety, benchmarking, applications, and more. We’re also…

ComputerUseAgents Workshop @workshopcua

5 months ago

1 15 30 17K 7

Download Image

0 13 25 3K 2

Yu Su (hiring postdoc) @ysu_nlp

6 months ago

🔥2025 is the year of agents, but are we there yet?🤔 🤯 "An Illusion of Progress? Assessing the Current State of Web Agents" –– our new study shows that frontier web agents may be far less competent (up to 59%) than previously reported! Why were benchmark numbers inflated? -…

11 69 233 35K 123

Download Image

Bernal Jiménez @bernaaaljg

6 months ago

Introducing ✨HippoRAG 2 ✨ 📣 📣 “From RAG to Memory: Non-Parametric Continual Learning for Large Language Models” HippoRAG 2 is a memory framework for LLMs that elevates our brain-inspired HippoRAG system to new levels of performance and robustness. 🔓 Unlocks Memory…

3 45 133 23K 95

Download Image

Sam Stevens @samstevens6860

6 months ago

What's actually different between CLIP and DINOv2? CLIP knows what "Brazil" looks like: Rio's skyline, sidewalk patterns, and soccer jerseys. We mapped 24,576 visual features in vision models using sparse autoencoders, revealing surprising differences in what they understand.

2 53 289 32K 206

Download Gif

Ziru Chen @RonZiruChen

7 months ago

🚀Our ScienceAgentBench is covered by @Nature News! With the help of @ShijieChen98 and @YifeiLiPKU, we sampled 20 tasks from ScienceAgentBench to conduct a head-to-head comparison of OpenAI o1 (2024-12-17) and DeepSeek R1. 🔹Performance: Given three attempts, R1 can solve 7 out…

nature @Nature

7 months ago

25 168 504 93K 134

2 16 53 13K 15

Ziru Chen @RonZiruChen

8 months ago

🎉ScienceAgentBench is accepted at #ICLR2025! 🚀 Ready to step beyond ML R&D? Test your agents on real-world, data-driven R&D tasks across diverse scientific disciplines. 🔬 👇 Resources and previous posts below:

Ziru Chen @RonZiruChen

11 months ago

5 39 121 51K 57

Download Video

1 13 35 5K 9

Shijie Chen @ShijieChen98

7 months ago

Thrilled to announce that our work, In-context Re-ranking, is accepted to #ICLR2025! TL;DR: By simply aggregating attention weights, we turn LLMs into powerful and efficient re-rankers generating a single token. More details below 👇:

Shijie Chen @ShijieChen98

11 months ago

1 32 90 18K 52

Download Image

0 4 15 2K 3

Ziru Chen @RonZiruChen

8 months ago

🚀ScienceAgentBench evaluation is now containerized! Inspired by SWE-Bench, we leverage Docker for task isolation, enabling multi-threaded execution and slashing evaluation time to under 30 minutes. Plus, evaluate your agents with just one bash command! Great work done by…

1 8 40 11K 14

Boyu Gou @BoyuGouNLP

9 months ago

With recent advancements like Claude 3.5 Computer Use and Gemini 2.0, the field of GUI Agents is rapidly evolving. 🚀 Excited to introduce GUI Agent Paper List, your go-to repo for the latest in GUI Agent research! 🌟 ✨ Key Features: - 170+ Papers grouped by environments,…

2 18 63 14K 28

Download Image

Xiang Yue @xiangyue96

9 months ago

✈️Flying to #NeurIPS2024 tmr! Excited to reconnect with old friends and meet new ones. I co-authored 6 papers at NeurIPS👇. I'm on the faculty job market this year. My work focuses on advancing the reasoning abilities of LLMs across modalities and contexts. Ping me for a chat☕