MLE Perception@ Lucid Motors | 3D Perception Researcher @ Bosch Center for AI | CMU Roboticssparsh913.github.io/sparshgarg/ Newark, CAJoined October 2023
Rugged by design. Elevated by nature.
The #LucidGravityX concept redefines what a trail-ready adventure vehicle could be.
Read more about our new bold concept: bit.ly/46Yu886
We’ve all seen humanoid robots doing backflips and dance routines for years.
But if you ask them to climb a few stairs in the real world, they stumble!
We took our robot on a walk around town to environments that it hadn’t seen before. Here’s how it works🧵⬇️
AI that truly understands the physical world should not be limited by robot type or tasks.
We tackle robotics in its full generality @SkildAI.
The goal is to build a continually improving, omni-bodied brain that can control any hardware for any task.
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: toyotaresearchinstitute.github.io/lbm1/
One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the…
The neural network objective function is a very complicated objective function. It's very non convex, and there are no mathematical guarantees whatsoever about its success. And so if you were to speak to somebody who studies optimization from a theoretical point of view, they…
Excited to share that TokenVerse won Best Paper Award at SIGGRAPH 2025! 🎉
TokenVerse enables personalization of complex visual concepts, from objects and materials to poses and lighting, each can be extracted from a single image and be recomposed into a coherent result. 👇
Log-linear attention — a new type of attention proposed by @MIT which is:
- fast and efficient as linear attention
- expressive as softmax
It uses a small but growing number of memory slots that increases logarithmically with the sequence length.
Here's how it works:
I'm excited to share our new work Diffusion as Shader (DaS), a versatile controllable video generation method for various tasks: object manipulation, camera control, mesh-to-video, and motion transfer.
Project page: igl-hkust.github.io/das/
Github: github.com/IGL-HKUST/Diff…
We move our eyes actively—driven by survival and efficiency—but we still don’t fully understand how. That makes supervised learning hard.
In our new work, we explore how to train VLMs to reason visually using RL. ViGoRL offers a glimpse into how models like o3 might be trained.
We move our eyes actively—driven by survival and efficiency—but we still don’t fully understand how. That makes supervised learning hard.
In our new work, we explore how to train VLMs to reason visually using RL. ViGoRL offers a glimpse into how models like o3 might be trained.
Do AI robots see the world like we do?
I dove head first into latent space to uncover the attention maps that show how my robot sees and understands the world.
Video, meet audio. 🎥🤝🔊
With Veo 3, our new state-of-the-art generative video model, you can add soundtracks to clips you make.
Create talking characters, include sound effects, and more while developing videos in a range of cinematic styles. 🧵
Excited to introduce our new Veo 2 capabilities!
Now with reference powered video generation (including style!), camera controls, outpainting, object add/removal & many more:
deepmind.google/models/veo/#ca…
Also presenting Flow, our new AI filmmaking tool.
labs.google/flow
I'm investing up to 250k first checks in teams building:
- robotics, drones, space
- crypto
- applied ai/ml
- ar/vr
- manufacturing, logistics
DMs always open. Tell me what you're building!
🤖Why is robot manipulation still an open challenge?
This video shows a kitting task -- packing multiple items into a single product. No robot today can do this autonomously.
Big challenges = big opportunities for research and industry. #robotics#manipulation#automation
[1/6] Our #CVPR2025 paper “DiffusionSfM” extends our RayDiffusion framework — inferring both geometry and cameras via diffusing pixelwise ray origins and endpoints.
1/ Despite having access to rich 3D inputs, embodied agents still rely on 2D VLMs—due to the lack of large-scale 3D data and pre-trained 3D encoders.
We introduce UniVLG, a unified 2D-3D VLM that leverages 2D scale to improve 3D scene understanding.
univlg.github.io
Adam (optimizer) got test of time award at ICLR in 2025. The real test of time award happened long time ago when people stopped citing Kingma and Ba every time they used it. Congratulations to the authors!
1/ Happy to share UniDisc - Unified Multimodal Discrete Diffusion – We train a 1.5 billion parameter transformer model from scratch on 250 million image/caption pairs using a **discrete diffusion objective**. Our model has all the benefits of diffusion models but now in…
30K Followers 8K FollowingData Maven with a Dash of Espresso ☕️ | Turning Numbers into Narratives | Senior Customer Insights Director | Tweets fueled by caffeine and curiosity
88 Followers 471 FollowingGerman/AI/Robotics/Physics/Xpert/SF/CA/USA. SF AI Studio Lead/Architect partnering with OpenAI/xAI/Google/Nvidia e/acc Follow to learn AI. Disclaimer below.
3K Followers 424 Following@runwayml fiddling with video models and 3d -- inventor of Gaussian Splatting -- a climbing gym owner back at home -- previously @Google @Adobe @Inria @Arm
2K Followers 612 FollowingAI-like human interested in human-like AI; working on thesis (stealth mode); research scientist at DeepMind 🇬🇧; previously 🇩🇪, 🇭🇺, 🇷🇴
336K Followers 71 FollowingThe party that actually represents America!
Community created and owned. No official affiliation.
News, updates, and commentary.
30K Followers 8K FollowingData Maven with a Dash of Espresso ☕️ | Turning Numbers into Narratives | Senior Customer Insights Director | Tweets fueled by caffeine and curiosity
75K Followers 13K FollowingNewsletter exploring AI&ML - AI 101, Agentic Workflow, Business insights. From ML history to AI trends. Led by @kseniase_ Know what you are talking about👇🏼
2K Followers 840 FollowingQuantum Physicist. (https://t.co/RSW9CYfZvo)
Currently Building AI-Native Physics Engine.
Building @qemlabs . prev: Quantum Computing, HEP @Emory_Physics
9K Followers 81 FollowingThe Hong Kong University of Science and Technology is a world-class international research university excelling in science, technology, business, and more.
2K Followers 5 FollowingSpAItial is pioneering spatial foundation models (SFMs), a groundbreaking AI paradigm that generates virtual environments that behave like the real world.
247K Followers 670 FollowingOfficial account of the University of California, Berkeley. Home of the @CalAthletics Golden Bears. 🐻
#BerkeleyNews #GoBears
28K Followers 1K FollowingResearch at @GoogleDeepMind. Controllable World Simulators (GNNs, Structured World Models, Neural Assets). Veo Team (Ingredients to Video Co-Lead)
6K Followers 5 FollowingThe fastest immigration solution for the startup and technology industry 🇺🇸 We help the best get a US work visa seamlessly, so you can get back to building