Yotam Perlitz 👾 @yotamperlitz

Research Scientist at @ibmresearch, Practicing #NLProc, #RL. Opinions are my own. perlitz.github.io Joined January 2015

Tweets

95
Followers

88
Following

159
Likes

50

Sebastian Gehrmann @sebgehr

7 months ago

What a GEM!

0 8 32 3K 7

Download Image

0 0 1 40 0

A big thank you to the Unitxt team, collaborators, and our community for an incredible 2024! Together, we pushed boundaries in AI evaluation and set new standards for the field. Read our End-of-Year Summary here: unitxt.ai/en/latest/blog… #unitxt #llmevaluation

0 2 7 245 0

LayerLens @layerlens_ai

9 months ago

Another cool benchmarking paper published yesterday. In "JuStRank: Benchmarking LLM Judges for System Ranking", researchers from @IBMResearch introduced JuStRank, the first large-scale benchmark for evaluating LLM judges for ranking target systems: arxiv.org/abs/2412.09569

1 4 9 255 1

UKP Lab @UKPLab

10 months ago

Prompting LMs is not enough to quantify their linguistic competence! Meet the Holmes🔎 benchmark at #EMNLP2024 #TACL or 👉🧵 💠Meta-study of current literature 💠Coverage of LMs and phenomena 💠Analysis of LM size, architecture, and instruction tuning holmes-benchmark.github.io

1 8 33 9K 14

Download Image

Philipp Schmid @_philschmid

10 months ago

RAG Developer Attention! 🔔 Docling is a new library from @IBM that efficiently parses PDF, DOCX, and PPTX and exports them to Markdown and JSON. It supports advanced PDF understanding and seamless integration with @llama_index and @LangChainAI. TL;DR: 🗂️ Parses numerous…

8 86 490 46K 699

Download Image

Elron Bandel @ElronBandel

10 months ago

Can lowercase really make or break an AI’s answer? 🤔 And what happens when an LLM ‘sees’ old TV-style white noise? 📺 We put these quirks to the test using Unitxt, showcasing the powerful, flexible testing it enables. Read on: unitxt.ai/en/latest/blog… #AI #MachineLearning #LLM

1 5 12 849 0

Leshem (Legend) Choshen 🤖🤗 @ACL @LChoshen

11 months ago

Scaling laws predict🦣large models by training🦟small ones, cool right? Fortunately, they are not that complicated or costly at least they don't have to be We have collected 400+ models fitted 1000+ scaling laws and created 1 guide for cheap & more reliable scaling law fitting:

4 36 247 33K 218

Download Image

Yufang Hou @yufanghou

12 months ago

It's been a great collaboration journey with my wonderful co-authors: @AndreasWaldis, @yotamperlitz @LChoshen, and @IGurevych. Please check it out and let us know if you'd like to see any additional functions or analyses added to the benchmark.

0 1 1 126 0

Yotam Perlitz 👾 @yotamperlitz

12 months ago

Get your benchmark game on: huggingface.co/spaces/ibm/ben…

Yotam Perlitz 👾 @yotamperlitz

12 months ago

Get your benchmark game on: huggingface.co/spaces/ibm/ben… https://t.co/5Y7QUz0Ype

1 5 16 2K 7

0 0 1 42 0

Download Image

Uri Berger @uriberger88

12 months ago

1/ Into Image Captioning? Don’t miss this! Struggling to keep up with the influx of new metrics but still see the same 5 (BLEU, METEOR, ROUGE, CIDEr, SPICE) leading? Read our recent Captioning evaluation survey! arxiv.org/abs/2408.04909 w\ @GabiStanovsky @AbendOmri @leafrermann >