🚨 Chinese AI company SenseTime just revealed SenseNova 5.5, an AI model that claims to beat GPT-4o across key metrics
Plus, big developments from Apple, YouTube, KLING, Neuralink, and Google DeepMind.
Here's everything going on in AI right now:
MobileLLM: nice paper from @AIatMeta about running sub-billion LLMs on smartphones and other edge devices.
TL;DR: more depth, not width; shared matrices for token->embedding and embedding->token; shared weights between multiple transformer blocks;
Paper: arxiv.org/abs/2402.14905
💥 If SDXL was trained with LLM as a text encoder, what would happen? 🧪
Kolors is the answer 🎨 - Kwai trained (from scratch!) an SDXL-arch model with the GLM-4 LLM as the text encoder, and it's fantastic!
▶️ Demo huggingface.co/spaces/gokaygo…
📁 Model huggingface.co/Kwai-Kolors/Ko…
New guide to using CUDA on @modal_labs just dropped.
It began its life as a document called "I am fucking done not understanding the CUDA stack", and after readelf-ing CUDA binaries, RTFMing the driver docs, & writing homebrew kernels, I'm excited to share it with the world!
Holy smokes. @modal_labs is now my preferred method to run GPU workloads using the NVIDIA CUDA toolkit.
I was tinkering with a GPU implementation of Conway's game of life using convolutions, and had it running in no time after following @charles_irl's guide.
Holy smokes. @modal_labs is now my preferred method to run GPU workloads using the NVIDIA CUDA toolkit.
I was tinkering with a GPU implementation of Conway's game of life using convolutions, and had it running in no time after following @charles_irl's guide. https://t.co/ZOfupPgE2Y
Here’s a proof of concept showing how data calculated in Python can be used in Blender. #Blender can process the received data with custom geometry nodes, custom shaders and full 3D interactivity. This minimal example uses @Networkx in @ProjectJupyter. Learn more in this thread🧵
Shoutout to the team that built artificialanalysis.ai . Really neat site that benchmarks the speed of different LLM API providers to help developers pick which models to use. This nicely complements the LMSYS Chatbot Arena, Hugging Face open LLM leaderboards and Stanford's HELM…
Another big change with AI Agents, Multi Agent system and AI Apps in general is how we are shifting from strongly typed apps to "fuzzy" apps.
Ofc there are now ways to get structured data out of LLMs that are great, but it's funny how it allowa for different kind of apps
8-bit and 4-bit ADAM implementations are now available in torchao thanks to @gaunernstgithub.com/pytorch/ao/tre…. Reducing your optimizer state by 4x and 8x respectively relative to fp32. These were written in pure PyTorch and then torch.compiled to achieve performance competitive…
🤖🔍🧑💻 AI Agents find me candidates!
Agents using RLHF find candidates, score and put together templates I can use!
Showcasing the new crewai train feature
Longer than my usual, hope you like it 🤞
Retweets are super appreciated 🙏
lmk if I this is one worth sharing the code
📚 @jerryjliu0 on "The Future of Knowledge Assistants":
🧠 Exploring the development of knowledge assistants, covering document processing, tagging & extraction, knowledge search & QA (RAG), knowledge base sourcing, workflow automation, and more.
🔑 Key points:
- 🧩 RAG…
Wrote quite a lengthy blog - "Reinforcement Learning from Human Feedback (RLHF) in Practice: A Deep Dive" 👨🔧
( 🔗link in 1st comment)
Covering the following topics
📌 The fundamental building blocks and flow of RLHF with its 3-phase process: a) Supervised fine-tuning (SFT) >…
I trained GPT-2 (124M) with @aaron_defazio's Schedule-Free optimizer on @karpathy's nanoGPT:
- Settings: AdamW with learning rate=0.0018 (same as x.com/Yuchenj_UW/sta…), warmup_steps=700; Schedule-Free AdamW with default learning rate=0.0025, warmup_steps=700
- Observations:…
DeepSeek-v2-Coder is really so impressive.
This blog did a great work on checkin 180+ LLMs on code writing quality.
There are only 3 models (Anthropic Claude 3 Opus, DeepSeek-v2-Coder, GPT-4o) that had 100% compilable Java code, while no model had 100% for Go.
"following plot…
A new version of Claude Engineer is out! 🔥
📝 Whole new diff file editing with an improved search and edit function
🌈 Color-coded diffs.
👨🏫 The system prompt has been updated with more detailed instructions.
💬 Conversation history management has been improved.
GraphRAG Ollama: 100% Local Setup, Keeping your Data Private
🔍 Integrate @ollama & @LMStudioAI
📚 Convert data to knowledge graph
🖥️ Run locally for privacy
⚙️ Configure models easily
📈 Enhance AI capabilities
Subscribe: youtube.com/@MervinPraison
The Top ML Papers of the Week (July 1 - July 7):
- APIGen
- CriticGPT
- Agentless
- LLM See, LLM Do
- Scaling Synthetic Data Creation
- Searching for Best Practices in RAG
...
The "Multi-token Prediction" paper (April-2024) from @AIatMeta and behind the Chameleon family of models is such an innovative idea.
👨🔧 Original Problem it solves
Most LLMs have a simple training objective: predicting the next word. While this approach is simple and scalable,…
[Transformer] by Hand ✍️
Make Your Own 🛠️👉 by-hand.ai/sp/tfmr
Over the past few months, I've collaborated with several AI educators to customize my AI by Hand exercises. I am glad that my materials are being used and appreciated in many classrooms around the world!…
100 Followers 1K FollowingMedical Sales, Tesla Car/Stock Owner & Stock Market Investor. I talk stocks, markets, business & economics. All holdings are shared through the link in my bio
184 Followers 2K FollowingTotal #AI newbie sharing my journey to learn it all - from machine learning to neural nets. #AI #MachineLearning #DeepLearning 🤖
11K Followers 11K FollowingArtist, Writer, Data Journalist Art Culture Dada Avantgarde Merzkunst Japan Transmedia AI VR IndieGames Generative AI 🇯🇵🇩🇪 | With AI since 2016
2K Followers 1K FollowingProfessor @UCR_CSE doing research in programming languages and software engineering.
https://t.co/8YGA4y2NaS
BSky: @manu.sridharan.net
1K Followers 309 FollowingProfessor at @PurdueCS; dad; occasional cancer fighter🎗️. Tweets about programming languages and formal methods, but mostly bird pictures.
221 Followers 126 FollowingResearcher in CS, Scala 3 Compiler Engineer & Team Lead @epfl, Serves on the Scala Core Team and SIP committee. Agonizes over Effects, Capabilities & Ownership.
454 Followers 1K FollowingCS prof @ WSU, HARP lab.
Formerly: Basili Fellow @ UMD; prof @ UAB.
I build programming languages & static analyses.
Free your mind and your ass will follow.
3K Followers 359 FollowingAssociate Professor in EECS at @MIT | Founding Advisor at @mosaicml | Programming Systems | Neural Networks | Approximate Computing
2K Followers 802 FollowingDistinguished Engineer @nvidia; working on Tile IR
Prev. Co-founder @octoml. PhD @uwcse.
Attempting to write about AI @ https://t.co/toFSukgrzM
94K Followers 1K FollowingI make camera apps @halidecamera, designer, photographer, dad. Dutchman in California. Most photos here taken with an iPhone. Presets and more on my site:
79K Followers 670 FollowingHelping ambitious designers refine their craft and build world-class interfaces. Designer for 20+ yrs. Founder of @shiftnudge
90K Followers 900 FollowingCEO at @magicpathai 🎨✨
Previously, @AnthropicAI, @brexHQ. @Uber, @Facebook. Creator of Claude Engineer, DesignerGPT, Sequential thinking MCP and more
2K Followers 1K FollowingI like big GPUs and I cannot lie
Follow @awesomeMLSS
Previously: https://t.co/xyTXgAGsEi @MetaAI @uofg @vectorinst @_NextAI @iiscbangalore
19K Followers 3K FollowingFrom SLAM to Spatial AI; Professor of Robot Vision, Imperial College London; Director of the Dyson Robotics Lab; Co-Founder of Slamcore. FREng, FRS.
690K Followers 600 Followingentrepreneurship zealot, grounded technology possibilist, believer in the power of ideas, passionate about sustainability & impact
48K Followers 2K FollowingChief AI & Co-founder @AnacondaInc; invented @pyscript_dev, @PyData @Bokeh @Datashader. Former physicist. A student of the human condition. bsky: @wang.social