Weiming Ren @wmren993

CS PhD student @UWaterloo @UWCheritonCS cs.uwaterloo.ca/~w2ren Joined November 2023

Tweets

57
Followers

62
Following

56
Likes

25

Yuansheng Ni @YuanshengNi

3 months ago

📢 Introducing VisCoder – fine-tuned language models for Python-based visualization code generation and feedback-driven self-debugging. Existing LLMs struggle to generate reliable plotting code: outputs often raise exceptions, produce blank visuals, or fail to reflect the…

7 15 34 10K 11

Dongfu Jiang @DongfuJiang

3 months ago

Introducing VerlTool - a unified and easy-to-extend tool agent training framework based on verl. Recently, there's been a growing trend toward training tool agents with reinforcement learning algorithms like GRPO and PPO. Representative works include SearchR1, ToRL, ReTool, and…

5 72 378 73K 260

Download Image

Wenhu Chen @WenhuChen

4 months ago

🚀 New Paper: Pixel Reasoner 🧠🖼️ How can Vision-Language Models (VLMs) perform chain-of-thought reasoning within the image itself? We introduce Pixel Reasoner, the first open-source framework that enables VLMs to “think in pixel space” through curiosity-driven reinforcement…

10 66 393 82K 322

Download Video

Wenhu Chen @WenhuChen

4 months ago

🧠📽️ New benchmark release: VideoEval-Pro! Long Video Understanding (LVU) is critical for building truly intelligent multimodal systems — think surveillance analysis, instructional video QA, or summarizing hour-long meetings. But here's the problem👇 🧩 Nearly all existing LVU…

Weiming Ren @wmren993

4 months ago

1 5 10 5K 2

Download Image

3 7 23 4K 8

Wenhu Chen @WenhuChen

5 months ago

🎬 Automated filmmaking is the future — You need dialogue, expressive talking heads, synchronized body motion, and multi-character interactions. 🚀 Today, in collaboration with @AIatMeta, we’re excited to introduce MoCha: Towards Movie-Grade Talking Character Synthesis 🔊…

3 21 87 10K 35

Download Video

Cong Wei @CongWei1230

5 months ago

🚀Thrilled to introduce ☕️MoCha: Towards Movie-Grade Talking Character Synthesis Please unmute to hear the demo audio. ✨We defined a novel task: Talking Characters, which aims to generate character animations directly from Natural Language and Speech input. ✨We propose…

18 59 220 37K 106

Download Video

Benjamin Schneider @BenRSchneider

6 months ago

Excited to share what I've been working on lately: ABC - A multimodal embedding model trained for embedding specific aspects of an image. ABC is perfect for visual embedding tasks that require a little more control over the embedding. Details on the training pipeline 👇

5 5 9 3K 1

Download Image

Wenhu Chen @WenhuChen

6 months ago

🚨 New Paper Alert! 🚨 Thrilled to announce VAMBA: a powerful hybrid Mamba-Transformer architecture designed specifically for hour-long video understanding tasks! VAMBA can receive more than 1000 frames on a single GPU card efficiently! 🎯 Why do we need hour-long video models?…