Introducing olmOCR, our open-source tool to extract clean plain text from PDFs!
Built for scale, olmOCR handles many document types with high throughput. Run it on your own GPU for free—at over 3000 token/s, equivalent to $190 per million pages, or 1/32 the cost of GPT-4o!
kicking off 2025 with our OLMo 2 tech report while payin homage to the sequelest of sequels 🫡
🚗 2 OLMo 2 Furious 🔥 is everythin we learned since OLMo 1, with deep dives into:
🚖 stable pretrain
🚔 lr anneal 🤝 data curricula 🤝 soups
🚘 tulu post-train
🚜 compute infra
👇🧵
Releasing OLMoE - the first good Mixture-of-Experts LLM that's 100% open-source
- 1B active, 7B total params for 5T tokens
- Best small LLM & matches more costly ones like Gemma, Llama
- Open Model/Data/Code/Logs + lots of analysis & experiments
📜arxiv.org/abs/2409.02060
🧵1/9
Congrats to our team for winning two paper awards at #ACL2024!
OLMo won the Best Theme Paper award, and Dolma won a Best Resource Paper award!
All the credit goes to the whole team for the massive group effort 🎉🎉
A GitHub flaw lets attackers upload executables that appear to be hosted on a company's official repo, such as Microsoft's—without the repo owner knowing anything about it.
The following URLs, for example, make it seem like these ZIPs are present on Microsoft's source code repo:…
Announcing our latest addition to the OLMo family, OLMo 1.7!🎉Our team's efforts to improve data quality, training procedures and model architecture have led to a leap in performance. See how OLMo 1.7 stacks up against its peers and peek into the technical details on the blog:…
29 Followers 441 FollowingResearch Intern @iiit_hyderabad | ex SRIB | ML Enthusiast | Diving deep into NLP, attention mechanisms, and next-gen AI models | Sustainable Computing
1K Followers 2K FollowingHolistic views of tech. Helping teams and devs manage complexity and build more effective software. #OptimiseForLearning #WAGMI
194 Followers 2K Following🧙♂️ Conjurer of interfaces and abstractions
👨💻 Crafter of tooling to ease the burden of mundanity
🐭 Man of Vim
🥷 Full Stack Ninja
⌨️ Web Tech Enthusiast
3K Followers 1K FollowingResearch Engineering Lead at @StanfordCRFM. Previously co-founder at Semantic Machines ⟶ MSFT. Lead developer of Levanter and Marin @[email protected]
10K Followers 1K FollowingAssistant Professor @UBC_CS & @VectorInst working on Natural Language Processing. Book: https://t.co/aBnNW4HaQ3. 🦋: @veredshwartz.bsky.social
1.4M Followers 958 FollowingMenswear writer. Editor at @putthison. Creator of @RLGoesHard. Bylines at The New York Times, The Financial Times, Politico, Esquire, and Mr. Porter
23K Followers 252 FollowingProviding official information on wildfires in Central Oregon on Prineville District BLM, Deschutes & Ochoco NFs & OR Dept of Forestry-Prineville/Sisters Units.