Dean @codevore1, Twitter Profile

Dean @codevore1

10 months ago

Shipping reliable AI-powered apps isn't just about model performance – it's about delivering consistent value to users. That's why with LLM evals - response quality, task completion rates, and user satisfaction often matter more than pure model performance. Love how @braintrustdata makes multimodal evals seamless. I particularly enjoyed their latest findings on evaluating Gemini models for vision, where they found that Gemini models use significantly fewer tokens per image compared to the GPT models, with GPT-4o using 3.5x the number of tokens per image. aidevmode.com/blog/braintrus…

1 2 3 300 0

Dean @codevore1

10 months ago

Here's the eval findings braintrust.dev/blog/gemini . Also found this useful: braintrust.dev/blog/after-eva… (h/t @ornelladotcom & @albertzhang36)

1 0 0 70 0