Which LM is better at agentic coding?
We have a bunch of useful academic benchmarks like SWE-Bench, but we don't have a good comparison of agentic coding LMs *in the wild*.
To solve this, we released PR Arena: github.com/neulab/pr-arena
Which LM is better at agentic coding?
We have a bunch of useful academic benchmarks like SWE-Bench, but we don't have a good comparison of agentic coding LMs *in the wild*.
To solve this, we released PR Arena: github.com/neulab/pr-arena
Introducing ⚔️PR Arena⚔️ - free AI coding agents to fix real GitHub issues.
Claude Sonnet 4 vs Gemini 2.5 Pro… Who writes better pull requests?
👉 Install here: github.com/apps/openhands…
Powered by @allhands_ai
Having appropriate tests makes a world of difference for agent-driven development.
If your agent can write a test to localize a bug or exercise a new feature, the following implementation is much more solid.
OpenHands+GPT-5 is now 🥇 on the SWT-Bench testing leaderboard!
We built OpenHands in the open (~60K ⭐️ on GitHub).
Now we’re giving back to the OSS ecosystem.
Announcing the OpenHands Cloud OSS Credit Program → $100–$500 credits for maintainers.
👉 Learn how to apply!
Nothing more frustrating than seeing "private scaffold" on public benchmark results
I love that model providers like Qwen and Mistral are now reporting their results specifically using OpenHands as the scaffold--feels like we're becoming a standard here
x.com/Alibaba_Qwen/s…
Nothing more frustrating than seeing "private scaffold" on public benchmark results
I love that model providers like Qwen and Mistral are now reporting their results specifically using OpenHands as the scaffold--feels like we're becoming a standard here
x.com/Alibaba_Qwen/s…
>>> Qwen3-Coder is here! ✅
We’re releasing Qwen3-Coder-480B-A35B-Instruct, our most powerful open agentic code model to date. This 480B-parameter Mixture-of-Experts model (35B active) natively supports 256K context and scales to 1M context with extrapolation. It achieves…
OpenHands is so general-purpose that I now think of leveraging it with workflow-driven prompting. Also stating constraints works well for me.
Examples:
• Examine the existing architecture, read docs for Y, plan how to implement X, then do it
→ Instead of: "Implement feature…
Introducing Devstral Small and Medium 2507! This latest update offers improved performance and cost efficiency, perfectly suited for coding agents and software engineering tasks.
What will software development look like in 2026?
With coding agents rapidly improving, dev roles may look quite different. My current workflow has changed a lot:
- Work in github, not IDEs
- Agents in parallel
- Write English, not code
- More code review
Thoughts + a video👇
PSA for engineering leadership exploring software agent solutions 🚨
This post nails the difference between agentic and agentless approaches — and why it actually matters for real software tasks, beyond SWE-Bench scores!
PSA for engineering leadership exploring software agent solutions 🚨
This post nails the difference between agentic and agentless approaches — and why it actually matters for real software tasks, beyond SWE-Bench scores!
Some users click with code agents. Others struggle.
Why? Agents are flexible and creative - just like their users! It's confusing!
Agents should understand, educate, and adapt to users. Even personalize.
If the agent isn’t willing to grow, the user likely won’t either.
Some users click with code agents. Others struggle.
Why? Agents are flexible and creative - just like their users! It's confusing!
Agents should understand, educate, and adapt to users. Even personalize.
If the agent isn’t willing to grow, the user likely won’t either.
What if we could have *trustworthy* agents that don't just write code, but also do research, understand multimodal content, and perform many practically useful tasks?
Today at OpenHands, we released a new agent that gets SOTA or competitive performance on 8 diverse tasks.
I believe “execution” and “evaluation” are two major challenges to adoption of code agents from a user perspective. Users must learn out how to leverage the agent effectively, and how to evaluate its work (asap)
Could determine whether good agents also delivers great experiences
939 Followers 421 FollowingIncoming PhD student at Carnegie Mellon University. Interested in Agents and Multimodal NLP, advised by Professor Graham Neubig.
5 Followers 168 FollowingRecruiting webshell engineers to penetrate websites, with a monthly salary of up to $100,000. If interested, please contact https://t.co/tLPPQ0ay1T
2K Followers 4K FollowingCurious mind with a love for technology and a focus on blockchain and AI. Exploring new ways to fuse tech and creativity. #Blockchain #Ai
3K Followers 3K FollowingPost-Training Lead @ Together AI | OpenChat Project Lead (#1 7B LLM on Arena for 2+ months, 2M+ downloads) | DeepCoder, DeepSWE
1K Followers 2K FollowingE/ACC 🇺🇸 Techno Optimist. Python Developer building Artificial Modular Intelligence. Bibliophile.
We're here for a good time, not a long time.
2K Followers 636 Following- Hacking on new stuff for fun & learning!
- Indie researcher 🧰
- doing standup comedy and stuff 🎤
- interests: CyberSec / ML / Program Synth
939 Followers 421 FollowingIncoming PhD student at Carnegie Mellon University. Interested in Agents and Multimodal NLP, advised by Professor Graham Neubig.
3K Followers 3K FollowingPost-Training Lead @ Together AI | OpenChat Project Lead (#1 7B LLM on Arena for 2+ months, 2M+ downloads) | DeepCoder, DeepSWE
1.4M Followers 958 FollowingMenswear writer. Editor at @putthison. Creator of @RLGoesHard. Bylines at The New York Times, The Financial Times, Politico, Esquire, and Mr. Porter
146K Followers 32 FollowingMakers of Devin, the first AI software engineer. We are an applied AI lab building end-to-end software agents. Join us: https://t.co/JZDd4Vik4P
637K Followers 35 FollowingWe're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.
6K Followers 365 FollowingComputer use agents lead @ Meta Superintelligence Labs; on leave from ML PhD @CarnegieMellon. Prev: multimodal research @GoogleAI. Opinions my own. 🇸🇬
5K Followers 1K FollowingCo-founder @allhands_ai, building OpenHands | PhD candidate @IllinoisCDS | BS @UMichCSE ('22) | Ex Intern @GoogleAI @Microsoft | Opinions are my own
13K Followers 2K Followinggetting to $50M ARR, building in public | @posthog co-ceo | orange blooded @ycombinator alum | teach me about capital letters | mother of hedgehogs
4K Followers 482 FollowingCreating a world where everyone can trust apps they use.
CEO and Co-Founder @ForAllSecure, Professor @cmu_ece and @CSDatCMU. Views are my own.
79K Followers 1 FollowingDemocratizing AI research, education, and technologies. Learn how to build with AI in our new AI Academy: https://t.co/zQXQt0Pem8
2K Followers 856 FollowingPhD student @LTIatCMU / @SCSatCMU, researching long context + decoding | she/her | also @ abertsch on bsky or https://t.co/L4HBUh0R9f or by email (https://t.co/bsHqwIMFPL)
3K Followers 345 FollowingCEO/Founder @SkillBuilderio (Service as a Software) ex @BCG @Nielsen via 3x acquisitions - Prof @SCSatCMU - maad Scientist @maadlabs - life w/ @ashleycecil
8K Followers 711 FollowingAssistant Professor MIT @medialab @MITEECS @nlp_mit || PhD from CMU @mldcmu @LTIatCMU || Foundations of multisensory AI to enhance the human experience.
5K Followers 828 FollowingPostdoc @LTIatCMU. PhD from Ohio State @osunlp. Author of MMMU, MAmmoTH. Training & evaluating foundation models. Opinions are my own.
95K Followers 1K FollowingCEO @HyperWriteAI, @OthersideAI, creator of https://t.co/PSUlubx5bb (Github for prompts), investor in @GroqInc @Etched @Rork_App @OpenRouterAI + many more
109K Followers 166 FollowingUPMC Professor of Computer Science @ CMU, President Elect ICML Board, VP of Research @ Meta (Multimodal LLMs, AI Agents), ex-Director of AI research at @Apple
15K Followers 38 FollowingThe AllenNLP team works on language-centered AI that equitably serves humanity. We deliver high-impact research and open-source tools to accelerate progress.
6K Followers 36 FollowingWe work on natural language processing, machine learning, linguistics, and deep learning. PIs: Dan Klein, @alsuhr, @sewon__min