A big thank you to the Unitxt team, collaborators, and our community for an incredible 2024! Together, we pushed boundaries in AI evaluation and set new standards for the field.
Read our End-of-Year Summary here: unitxt.ai/en/latest/blog…#unitxt#llmevaluation
Another cool benchmarking paper published yesterday. In "JuStRank: Benchmarking LLM Judges for System Ranking", researchers from @IBMResearch introduced JuStRank, the first large-scale benchmark for evaluating LLM judges for ranking target systems: arxiv.org/abs/2412.09569
Prompting LMs is not enough to quantify their linguistic competence! Meet the Holmes🔎 benchmark at #EMNLP2024#TACL or 👉🧵
💠Meta-study of current literature
💠Coverage of LMs and phenomena
💠Analysis of LM size, architecture, and instruction tuning
holmes-benchmark.github.io
RAG Developer Attention! 🔔 Docling is a new library from @IBM that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with @llama_index and @LangChainAI.
TL;DR:
🗂️ Parses numerous…
Can lowercase really make or break an AI’s answer? 🤔 And what happens when an LLM ‘sees’ old TV-style white noise? 📺
We put these quirks to the test using Unitxt, showcasing the powerful, flexible testing it enables.
Read on: unitxt.ai/en/latest/blog…#AI#MachineLearning#LLM
Scaling laws predict🦣large models by training🦟small ones, cool right?
Fortunately, they are not that complicated or costly
at least they don't have to be
We have collected 400+ models
fitted 1000+ scaling laws
and created 1 guide
for cheap & more reliable scaling law fitting:
It's been a great collaboration journey with my wonderful co-authors: @AndreasWaldis, @yotamperlitz@LChoshen, and @IGurevych. Please check it out and let us know if you'd like to see any additional functions or analyses added to the benchmark.
1/ Into Image Captioning? Don’t miss this!
Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading?
Read our recent Captioning evaluation survey!
arxiv.org/abs/2408.04909
w\
@GabiStanovsky@AbendOmri@leafrermann
>
1K Followers 2K FollowingAssistant professor at Bar Ilan University.
Primary research is Machine learning, computational biology, and signal processing.
323 Followers 275 FollowingPhD student at @Csehuji (@HyadataLab). Interested in Natural Language Processing, Data Science, Human-Computer Interaction and Computational Creativity.
1K Followers 1K FollowingAssistant Professor at @LIACS | NLP for Social Good | #Emotions #HateSpeech #Fairness #Ethics | 🏆Awards: @scie_inform @FundacionBBVA @sepln | ☀️🥾🏔
2K Followers 2K FollowingLecturer (~Assistant Professor) in Artificial Intelligence. Work on Deep Learning for #NLProc and Deep Contextual Meaning Representations
14K Followers 642 FollowingStanford Professor of Linguistics and, by courtesy, of Computer Science. Member of technical staff @stanfordnlp and @StanfordAILab. Co-founder @ Bigspin AI.
3K Followers 879 FollowingPM @DARPA; Prof of Math and CS @Rutgers-Newark; co-founder @ https://t.co/e6dJA2bLus; Math @the_IAS 2021-2023.
https://t.co/2plDQE0s6K https://t.co/XuiVK8VmO3
29K Followers 431 FollowingProfessor, CS, U. British Columbia. CIFAR AI Chair, Vector Institute. Sr. Advisor, DeepMind | ML, AI, deep RL, deep learning, AI-Generating Algorithms (AI-GAs)
11K Followers 1K FollowingI like tokens! I lead the OLMo data team at @allen_ai w/ @kylelostat. Open source is fun 🤖☕️🍕🏳️🌈 Opinions are sampled from my own stochastic parrot
121K Followers 357 FollowingFormer head of Obama Auto Task Force. Wall Street financier. Contributing Writer to NY Times Op-Ed. Morning Joe Economic Analyst. 🌐
8K Followers 198 FollowingAssistant Prof at Stanford CS, member of @stanfordnlp and statsml groups; Formerly at Microsoft / postdoc at Stanford CS / Stats.
106K Followers 2K FollowingCovering the latest in AI development • ML Eng since 2017 • Building @AlphaSignalAI into the #1 source of news for AI devs → At 250k users.
323 Followers 275 FollowingPhD student at @Csehuji (@HyadataLab). Interested in Natural Language Processing, Data Science, Human-Computer Interaction and Computational Creativity.
2.0M Followers 619 FollowingProfessional rocket orientation specialist, explainer of flamey stuff and rocket chaser. Bringing space down to Earth for everyday people 🚀