I am humbled and grateful to receive two grants from Open Philanthropy @open_phil to advance the safety of AI systems, co-led with my colleague @ysu_nlp. I'm also honored to be the first at @OhioState to receive Open Philanthropy funding.
Most credit goes to the amazing students…
I am humbled and grateful to receive two grants from Open Philanthropy @open_phil to advance the safety of AI systems, co-led with my colleague @ysu_nlp. I'm also honored to be the first at @OhioState to receive Open Philanthropy funding.
Most credit goes to the amazing students…
Computer Use: Modern Moravec's Paradox
A new blog post arguing why computer-use agents may be the biggest opportunity and challenge for AGI.
tinyurl.com/computer-use-a…
Table of Contents
> Moravec’s Paradox
> Moravec's Paradox in 2025
> Computer use may be the biggest opportunity…
🧪 Chemists spend many hours planning and replanning synthetic routes for a target molecule to avoid dangerous reactants and intermediates.☠️🚫
🤔 What if an AI agent could plan around them automatically—better and faster than human experts?
🔬 Constrained retrosynthesis…
🎉 Excited to share that our paper EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution was accepted at VLDB 2025! 🚀
📢 Reminder: join us at VLDB 2025 in London!
🗓️ Sept 2 (Tue), 10:45 AM – 12:15 PM
📍 Room Wordsworth 4F
📄 vldb.org/pvldb/vol18/p3…#VLDB2025#LLMs
Excited to receive the NSF CAREER Award!
Grateful for all the support and encouragement I've received in the 6 years of faculty life so far, especially for my extremely supportive family and for the amazing students @osunlp I have had the privilege to work with!!
Excited to be giving a keynote at the NeurIPS version of the Imageomics workshop! 🎉It's also not too late to submit your own work to the workshop! (Aug. 22nd deadline, details below 👇)
Excited to be giving a keynote at the NeurIPS version of the Imageomics workshop! 🎉It's also not too late to submit your own work to the workshop! (Aug. 22nd deadline, details below 👇)
Remember “Son of Anton” from the Silicon Valley show(@SiliconHBO)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code?
It’s starting to look a lot like reality.
Even 18 months ago, my own…
Remember “Son of Anton” from the Silicon Valley show(@SiliconHBO)? The experimental AI that “efficiently” orders 4,000 lbs of meat while looking for a cheap burger and “fixes” a bug by deleting all the code?
It’s starting to look a lot like reality.
Even 18 months ago, my own… https://t.co/XsrYkqEIw0
Safety is one of the biggest blockers for computer use agents: how can I trust an agent won’t accidentally do something consequential without my permission?
We collect and release the first large-scale dataset for detecting consequential actions on the web, and train the best…
Safety is one of the biggest blockers for computer use agents: how can I trust an agent won’t accidentally do something consequential without my permission?
We collect and release the first large-scale dataset for detecting consequential actions on the web, and train the best… https://t.co/IK4kRlmqcE
I'm excited to bring the Imageomics workshop to NeurIPS 2025! Consider submitting your work on ai4ecology, ai4conservation and general ai4science--if you're using images to learn something about the natural world, chances are it's a good fit for the imageomics workshop!
I'm excited to bring the Imageomics workshop to NeurIPS 2025! Consider submitting your work on ai4ecology, ai4conservation and general ai4science--if you're using images to learn something about the natural world, chances are it's a good fit for the imageomics workshop!
🚨 Postdoc Hiring:
I am looking for a postdoc to work on rigorously evaluating and advancing the capabilities and safety of computer-use agents (CUAs), co-advised with @ysu_nlp@osunlp. We welcome strong applicants with experience in CUAs, long-horizon reasoning/planning,…
We're already using AI search systems every day for more and more complex tasks, but how good are they really? Challenge: evaluation is hard with no fixed ground truth! In Mind2Web 2, we use agents to evaluate agents. Really excited! Thanks to everyone who made this possible!
We're already using AI search systems every day for more and more complex tasks, but how good are they really? Challenge: evaluation is hard with no fixed ground truth! In Mind2Web 2, we use agents to evaluate agents. Really excited! Thanks to everyone who made this possible!
🔎Agentic search like Deep Research is fundamentally changing web search, but it also brings an evaluation crisis⚠️
Introducing Mind2Web 2: Evaluating Agentic Search with Agents-as-a-Judge
- 130 tasks (each requiring avg. 100+ webpages) from 1,000+ hours of expert labor
-…
📢 Introducing AutoSDT, a fully automatic pipeline that collects data-driven scientific coding tasks at scale!
We use AutoSDT to collect AutoSDT-5K, enabling open co-scientist models that rival GPT-4o on ScienceAgentBench!
Thread below ⬇️ (1/n)
It’s so exciting to see BioCLIP 2 demonstrates a biologically meaningful embedding space while only trained to distinguish species. Can’t wait to see more applications of BioCLIP 2 in solving real world problems.
I’m attending #CVPR2025 in Nashville. Happy to chat about it!
It’s so exciting to see BioCLIP 2 demonstrates a biologically meaningful embedding space while only trained to distinguish species. Can’t wait to see more applications of BioCLIP 2 in solving real world problems.
I’m attending #CVPR2025 in Nashville. Happy to chat about it!
🔬 Introducing ChemMCP, the first MCP-compatible toolkit for empowering AI models with advanced chemistry capabilities!
In recent years, we’ve seen rising interest in tool-using AI agents across domains. Particularly in scientific domains like chemistry, LLMs alone still fall…
66 Followers 674 FollowingPhD Candidate at @ICepfl 👩🏻💻 Working on multi-modal AI reasoning models in scientific domains | ex DeepMind Intern https://t.co/trZEyfX4Js
548 Followers 836 Following#BlackLivesMatter | Postdoc @MIT_CSAIL | sometimes I make computers do cool stuff, but mainly I just break things | she / her / hers
315 Followers 3K Following📎 Learning & Research: Deep Learning, Computational Protein Design, Protein Language Models.
📎 PhD Student at Drexel University.
📎 Becoming an avid reader.
11K Followers 1K FollowingI like tokens! I lead the OLMo data team at @allen_ai w/ @kylelostat. Open source is fun 🤖☕️🍕🏳️🌈 Opinions are sampled from my own stochastic parrot
66 Followers 674 FollowingPhD Candidate at @ICepfl 👩🏻💻 Working on multi-modal AI reasoning models in scientific domains | ex DeepMind Intern https://t.co/trZEyfX4Js
548 Followers 836 Following#BlackLivesMatter | Postdoc @MIT_CSAIL | sometimes I make computers do cool stuff, but mainly I just break things | she / her / hers
163K Followers 166 FollowingCo-founder of Thinking Machines Lab @thinkymachines; Ex-VP, AI Safety & robotics, applied research @OpenAI; Author of Lil'Log