# On the "hallucination problem"
I always struggle a bit with I'm asked about the "hallucination problem" in LLMs. Because, in some sense, hallucination is all LLMs do. They are dream machines.
We direct their dreams with prompts. The prompts start the dream, and based on the…
One of the things we do in the beginning of the evals course is show students real system prompts. Most are mind boggled & ask why these prompts are so long, how did OpenAI/Gemini/Claude/Manus/etc arrive at these prompts, etc. It would be awesome to have an answer. Currently we…
One of the things we do in the beginning of the evals course is show students real system prompts. Most are mind boggled & ask why these prompts are so long, how did OpenAI/Gemini/Claude/Manus/etc arrive at these prompts, etc. It would be awesome to have an answer. Currently we…
I hate it when people just read the titles of papers and think they understand the results.
The "Illusion of Thinking" paper does 𝘯𝘰𝘵 say LLMs don't reason. It says current “large reasoning models” (LRMs) 𝘥𝘰 reason—just not with 100% accuracy, and not on very hard…
Biggest Langfuse update yet: We're open sourcing ALL product features under the MIT license!
✅ LLM-as-a-Judge Evaluations
✅ Annotation Queues
✅ Prompt Experiments
✅ Playground
✅ And more...
We wrote a bit about why we are making this change on our blog 👇
Calling c the "speed of light" completely misses the point. Rather, c is the "spacetime exchange rate": how many units of space you can exchange for one unit of time.
In actuality, everything travels at the "speed of light", just not necessarily through space alone... (1/4)
excited to finally share on arxiv what we've known for a while now:
All Embedding Models Learn The Same Thing
embeddings from different models are SO similar that we can map between them based on structure alone. without *any* paired data
feels like magic, but it's real:🧵
excited to finally share on arxiv what we've known for a while now:
All Embedding Models Learn The Same Thing
embeddings from different models are SO similar that we can map between them based on structure alone. without *any* paired data
feels like magic, but it's real:🧵
Here's the full workshop handout plus annotated slides from "Building software on top of Large Language Models", a three hour tutorial I presented yesterday at PyCon US #PyConUSsimonwillison.net/2025/May/15/bu…
This is a really good question. In my experience, domain experts' reluctance to write prompts boils down to
- not knowing how to write a good prompt in the first place (it's not necessarily as simple as instructing a human expert/coworker to do the task)
- not having the…
This is a really good question. In my experience, domain experts' reluctance to write prompts boils down to
- not knowing how to write a good prompt in the first place (it's not necessarily as simple as instructing a human expert/coworker to do the task)
- not having the… https://t.co/b7XTvxijNR
A feature I would love to see from every single hosted API vendor is some kind of special case where if you prompt "what model ID are you?" it replies with a definitely-not-hallucinated stable version identifier
(If model vendors are going to start switching date-based aliases…
A feature I would love to see from every single hosted API vendor is some kind of special case where if you prompt "what model ID are you?" it replies with a definitely-not-hallucinated stable version identifier
(If model vendors are going to start switching date-based aliases…
The most significant events during my working lifetime were October 11 1993 (beta of Mosaic web browser for Mac released) and November 30th 2022 (ChatGPT released).
One thing the GPT-4o personality issue demonstrates is that treating AI like every other online product by maximizing for engagement & likeability will have unintended consequences that could cause real problems, both for the usefulness of the models & for the people using them,
@rubyrangerr I think you are misunderstanding what this tech demo actually is, but I will engage with what I think your gripe is — AI tooling trivializing the skillsets of programmers, artists, and designers.
My first games involved hand assembling machine code and turning graph paper…
Many people, myself included, didn't try to build a product around a language model because during the time you would work on a business-specific dataset, a larger generalist model will be released that will be as good for your business tasks as your smaller specialized model.…
Today I tested 6 prompt management systems to store prompts in one place and update them without changing my product's source code. Only 1/6 functioned smoothly and didn't have bugs on my journey. Kudos to the @langfuse team it is looking promising!
I really wish @OpenAI would give the new image generation feature in GPT-4o a usable name
Are we really expected to say "I made this using GPT-4o image generation"?
(That's also pretty unclear given that GPT-4o in ChatGPT used to be able to generate images using DALL-E instead)
1K Followers 1K FollowingApplied AI Consultant.
Ph.D in AI and OR. AI Engineer Focused on LLMs, RAG, search, and building AI-powered software. Sharing my real-world experiences.
1K Followers 2K FollowingNobel-Prize for Blogging 1998, MacArthur Grant Wannabe, Author of https://t.co/Vad6pM67b5
R/T, reciprocal follows are not endorsements.
19K Followers 2K FollowingPolitical analyst. Former assistant professor at RUDN Moscow. PhD in political science. Bylines in Euronews, Bloomberg, Novaya Gazeta and Radar. Views my own.
1K Followers 1K FollowingApplied AI Consultant.
Ph.D in AI and OR. AI Engineer Focused on LLMs, RAG, search, and building AI-powered software. Sharing my real-world experiences.
15K Followers 443 FollowingProfessor of Political Science at the University of California San Diego IR theory and conflict, formal modeling. All views expressed here are strictly personal
495K Followers 372 FollowingMedia platform covering global conflict zones. Focus on the Russian-Ukrainian war. If you'd like to support our voluntary work: https://t.co/PmM2wwDA1Y.
53K Followers 5K FollowingScience and technology are our best weapons against Covid-19, climate change, cancer, neurodegenerative diseases, diabetes and more.
4K Followers 1K FollowingCo Founder of @Nakama_Labs. Follow me if you want to learn more about the ecosystem on IOTA Smart Contract Chains Shimmer and IOTA. PFP @Repeatr888
295K Followers 3 FollowingBuilt to make a difference. A decentralized blockchain infrastructure to build and secure our digital world.
https://t.co/DbHYPecu8a
24K Followers 2K FollowingWir setzen uns für Wohlstand für Alle ein. Unsere Arbeit wird von der M+E-Industrie finanziert.
Impressum: https://t.co/Agi0rs3kur