📢 Introducing VisCoder – fine-tuned language models for Python-based visualization code generation and feedback-driven self-debugging.
Existing LLMs struggle to generate reliable plotting code: outputs often raise exceptions, produce blank visuals, or fail to reflect the…
Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl.
Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and…
🚀 New Paper: Pixel Reasoner 🧠🖼️
How can Vision-Language Models (VLMs) perform chain-of-thought reasoning within the image itself?
We introduce Pixel Reasoner, the first open-source framework that enables VLMs to “think in pixel space” through curiosity-driven reinforcement…
🧠📽️ New benchmark release: VideoEval-Pro!
Long Video Understanding (LVU) is critical for building truly intelligent multimodal systems — think surveillance analysis, instructional video QA, or summarizing hour-long meetings.
But here's the problem👇
🧩 Nearly all existing LVU…
🧠📽️ New benchmark release: VideoEval-Pro!
Long Video Understanding (LVU) is critical for building truly intelligent multimodal systems — think surveillance analysis, instructional video QA, or summarizing hour-long meetings.
But here's the problem👇
🧩 Nearly all existing LVU…
🎬 Automated filmmaking is the future — You need
dialogue, expressive talking heads, synchronized body motion, and multi-character interactions.
🚀 Today, in collaboration with @AIatMeta, we’re excited to introduce MoCha: Towards Movie-Grade Talking Character Synthesis
🔊…
🚀Thrilled to introduce ☕️MoCha: Towards Movie-Grade Talking Character Synthesis
Please unmute to hear the demo audio.
✨We defined a novel task: Talking Characters, which aims to generate character animations directly from Natural Language and Speech input.
✨We propose…
Excited to share what I've been working on lately: ABC - A multimodal embedding model trained for embedding specific aspects of an image. ABC is perfect for visual embedding tasks that require a little more control over the embedding.
Details on the training pipeline 👇
🚨 New Paper Alert! 🚨
Thrilled to announce VAMBA: a powerful hybrid Mamba-Transformer architecture designed specifically for hour-long video understanding tasks! VAMBA can receive more than 1000 frames on a single GPU card efficiently!
🎯 Why do we need hour-long video models?…
🚨 New Paper Alert! 🚨
Thrilled to announce VAMBA: a powerful hybrid Mamba-Transformer architecture designed specifically for hour-long video understanding tasks! VAMBA can receive more than 1000 frames on a single GPU card efficiently!
🎯 Why do we need hour-long video models?… https://t.co/7XmbwN5BYR
1K Followers 6K Followingscaling speech native LLMs @rimelabs
the future is willed into existence.
bioML, discovering new science, housing, industrial policy, local politics.
2K Followers 2K FollowingPh.D. Student @PrincetonCS. Prev @Stanford @UW @pika_labs @MSFTResearch @UofIllinois @ZJU_China. I used to work on computer vision, but it's not all I do.
376 Followers 3K FollowingPhD in CS from @UWaterloo, Founding Engineer at https://t.co/YGMoEZfDQD, prev enjoyed my time at @oraclelabs, Noah’s Arc Lab, Data Analytics Lab & @CVC_UAB
136 Followers 689 FollowingPh.D student @UBC Prev: BEng @Tsinghua_Uni. Broadly interested in everything about LLMs. Try to build in public and learn in public.
201 Followers 455 FollowingMMath @UWCheritonCS w/ @fredahshi | 👩🏻💻: cognitive model, interpretability | 24' @ZJU_China @NlpWestlake | Some day I'll find the chance to say my εὕρηκα
16K Followers 3K FollowingHelped a SaaS client grow to $3M ARR. 2x SaaS exits of my own. Now helping SaaS founders scale with paid traffic funnels at https://t.co/ES6QsyV6Lv.
49 Followers 627 FollowingPh.D. Scholar | AI for Digital Twin & Tidal Energy Systems | Deep Learning, Signal Processing & Computer Vision | IRDL CNRS UMR 6027, France
724 Followers 1K FollowingCS grad student @UWCheritonCS, prev intern @nvidia DIR. I like to observe things. My works are often related to Visuals. Born in 🇭🇰.
1K Followers 6K Followingscaling speech native LLMs @rimelabs
the future is willed into existence.
bioML, discovering new science, housing, industrial policy, local politics.
210 Followers 728 FollowingCS PhD student @UWCheritonCS. Prev @ServiceNowRSRCH. A big fan of @ManCity. Working on multimodal learning, VLM/LLM agents and data management.
376 Followers 3K FollowingPhD in CS from @UWaterloo, Founding Engineer at https://t.co/YGMoEZfDQD, prev enjoyed my time at @oraclelabs, Noah’s Arc Lab, Data Analytics Lab & @CVC_UAB
2K Followers 2K FollowingPh.D. Student @PrincetonCS. Prev @Stanford @UW @pika_labs @MSFTResearch @UofIllinois @ZJU_China. I used to work on computer vision, but it's not all I do.
1K Followers 1K FollowingCS PhD student, NTU@Singapore, advised by Prof. Ziwei Liu.
Vibe building intelligence, prev works LLaVA-OneVision/LMMs-Eval, more at https://t.co/O1KPBMCgLl
905 Followers 387 FollowingLead Research Scientist at Netflix Eyeline Studios. Ex-Salesforce. Ex-NVIDIA. Ex-Adobe. Joint PhD at UMD & MPI. Leading efforts in visual and multimodal GenAI.
2K Followers 16 FollowingHey there 👋 we're https://t.co/tai6JPqNOG, a team of researchers, engineers, and designers on a mission to create powerful yet intuitive AI solutions.
201 Followers 455 FollowingMMath @UWCheritonCS w/ @fredahshi | 👩🏻💻: cognitive model, interpretability | 24' @ZJU_China @NlpWestlake | Some day I'll find the chance to say my εὕρηκα
950K Followers 764 FollowingProfessor at NYU. Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate.
1.4M Followers 1K FollowingBuilding @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
1.2M Followers 279 FollowingWe’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.
2K Followers 87 FollowingShanghai AI Lab, General Vision Team. We created InternImage, BEVFormer, VideoMAE, LLaMA-Adapter, Ask-Anything, and many more! [email protected]
724 Followers 1K FollowingCS grad student @UWCheritonCS, prev intern @nvidia DIR. I like to observe things. My works are often related to Visuals. Born in 🇭🇰.
4.3M Followers 3 FollowingOpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. We’re hiring: https://t.co/dJGr6Lg202
57K Followers 568 FollowingAssistant Prof of CS @UWaterloo, Faculty @VectorInst, Canada @CIFAR_News AI Chair. Joining @NYU_Courant September 2026. Co-EiC @TmlrOrg. I lead @TheSalonML.
No recent Favorites. New Favorites will appear here.