RetoMaton is a neuro-symbolic framework that adds a symbolic memory layer to LLMs, built as a weighted finite automaton, on top of a frozen LLM to guide retrieval in a structured, interpretable, and reliable way
So reasoning becomes more stable, transparent, and reliable.…
A classic paper - "An Introduction to Autoencoders"
covers the mathematics and the fundamental concepts of autoencoders.
Autoencoders function by encoding data into a smaller representation through a neural network and then reconstructing it. This approach is often used for…
A nice and compact 101 page Matrix Calculus book on arxiv.
introduces the extension of differential calculus to functions on more general vector spaces.
focuses on practical computational applications, such as large-scale optimization and machine learning, where derivatives…
The paper proposes Token Order Prediction, an extra training signal that boosts next token modeling across many tasks.
Instead of predicting exact future tokens like Multi-Token Prediction, it ranks vocabulary items by how soon they will appear within a window.
This ranking…
Great paper from Sakana AI. Introduces Model Merging of Natural Niches (M2N2)
An evolutionary merging method that fuses skills by evolving boundaries, preserving diversity, and pairing complementary parents.
It drops fixed layer groups, uses random split points that grow over…
A new attention mechanism, Dynamic Sparse Attention, just dropped. The authors' evaluation is extensive, showing good performance. with DSA the model generates its attention mask, and an efficient kernel saves computation. Cool work!
🔗arxiv.org/pdf/2508.02124
attention sinks may be a bias in causal transformers.
as some of you know, i've been writing a long blogpost on attention and its properties as a message-passing operation on graphs. while doing so, i figured i might have found an explanation for which attention sinks may be an…
LLM embedding spaces quietly compress many meanings into a small shared space that mirrors human judgments.
3D subspace captures about 50% of semantic variance.
Psychology shows people judge words along a few axes, like how positive, how powerful, and how active.
The authors…
I just came across this excellent blog post about the recent results we published about the capabilities of the new type of AI we are building (no deep learning!) at the @1000brainsproj 🤖 🧠 : gregrobison.medium.com/hands-on-intel…
"Ninio's extinction illusion"
There are twelve dots at crosses, but only a few dots are visible simultaneously.
Ninio, J. & Stevens, K. A. (2000). Variations on the Hermann grid: an extnction illusion. Perception, 29, 1209-1217.
👨🔧 Github: CoreNN. Database for querying billions of vectors and embeddings in sublinear time on commodity machines.
1 billion Reddit comment embeddings in 15 ms from a 4.8 TB index on disk
- Uses cheap flash storage, not expensive DRAM, costing 40x–100x less.
- Scales from 1…
At MIT, I learned about RNNs in my NLP class with Prof. Michael Collins. He built a model from my keystrokes to predict who I was. To me, it felt like a magic box. Years later, when I had to teach RNNs, I forced myself to go inside the box. ⬇️ Download: byhand.ai/rnn…
One of the best resource on Reinforcement Learning ❤️
A classic 216 page overview covering the basic fundamentals of RL.
The paper maps RL clearly, showing how agents learn good behavior from rewards and experience.
The agent observes a situation, chooses an action, receives…
A classic 129 page paper/book from 2024.
Frames entropy as missing information, then shows how physics quantifies and uses it.
It starts from a concrete puzzle: hydrogen gas at room conditions carries about 23 bits of unknown information per molecule.
Shannon entropy measures…
28K Followers 10K FollowingNeuroscience,Insular cortex, Neurobiology RT≠endors I do not reply to direct messages
mstdn: @[email protected]
Bluesky: @claeneuro.bsky.social
781 Followers 287 FollowingReverse engineering the neocortex 🧠 to revolutionize AI 🤖. An open-source initiative backed by Jeff Hawkins and The Gates Foundation.
12K Followers 449 FollowingMachine learning researcher at @GoogleDeepMind & mathematician. Host of The Cartesian Cafe podcast. All opinions are my own.
13K Followers 838 FollowingNeural reverse engineer. Research scientist at Meta Reality Labs. Adjunct prof at Stanford. Prev: Google Brain, Stanford postdoc. Milton Hershey Alum.
12K Followers 657 FollowingI fall in love with a new #machinelearning topic every month 🙄 |
Researcher @SapienzaRoma | Author: Alice in a diff wonderland https://t.co/A2rr19d3Nl
5K Followers 668 FollowingIncoming Assistant Prof, Toyota Technical Institute at Chicago @TTIC_Connect
Recruiting PhD students (start 2026) 👀
Will irl - TC0 enthusiast
83K Followers 8K FollowingCompiling in real-time, the race towards AGI.
🗞️ Don't miss my daily top 1% AI analysis newsletter directly to your inbox 👉 https://t.co/6LBxO8215l
263K Followers 664 FollowingBuilding with AI agents @dair_ai • Prev: Meta AI, Galactica LLM, Elastic, PaperswithCode, PhD • I share insights on how to build with AI Agents ↓
8K Followers 736 FollowingHead of AI @NormalComputing. Ex @Meta, @BARCdk, SupWiz, @OxfordQuantum. Tweets on Math, AI, #dspy, Probability, ML, Algorithms and Randomness. Recently tensors.
35K Followers 5K FollowingExperienced Data Science Leader | PhD in Machine Learning | 4x Author | Black Belt 🥋 in Time Series | Chief Conformal Prediction Promoter| Mathematician |