Just had an aha moment w/ GEPA @DSPyOSS - Gemini 2.5 Flash-Lite (GPT 4.1 reflection LM) - 3 signatures in one compound module - 12 training examples - 10 test examples - 32 minutes of optimizer runtime - $0.90 total cost Results: - Baseline: 68.2% - GEPA-Optimized: 95.3% 🤯
Just crazy. For less than a dollar of compute cost, and half an hour of wall-clock time, I get prompts that I may have literally taken days to tune. Sure, the instructions it writes are verbose but I hear in the grapevine that a new `instruction_proposer` will soon be released as a parameter to `dspy.GEPA`. This will (hopefully) help make them more concise. What times we live in! Superb work by @LakshyAAAgrawal and everyone else contributing tp @DSPyOSS. 🚀
@tech_optimist @DSPyOSS You wait till you see what i got cooking with GEPA ;)
@tech_optimist @DSPyOSS with 10 test examples, how did you get decimal point level metrics?
@tech_optimist @DSPyOSS Wow, beautiful prompt created with GEPA>
@tech_optimist @DSPyOSS here’s a different take. could have been skill issue on my part x.com/eugeneyalt/sta…
@tech_optimist @DSPyOSS here’s a different take. could have been skill issue on my part x.com/eugeneyalt/sta…
@tech_optimist @DSPyOSS Thank you so much for trying out GEPA! We always have more things in the pipeline. Will announce soon!
@tech_optimist @DSPyOSS It seems like dspy is mostly good at training classifiers for static target labels. That's what almost every example is about. It's such a small use case for ML these days
Regarding the super-lengthy prompt, I would like to highlight that the initial rounds of GEPA end up carrying a lot of information from the first few examples it sees, but as the optimization progresses, GEPA creates generalized rules, and the prompts start getting shorter. I expect more examples, and longer optimization will lead to shorter prompts, even without the new feature we are landing soon, which should further address this: x.com/tech_optimist/…
@tech_optimist @DSPyOSS Very recent PR allows multimodal content in the GEPA reflection feedback loop. Perhaps an idea for a different tutorial ;) My little OCR application had similar results than you, gogo DSPy
@tech_optimist @DSPyOSS That is quite the jump for such a low cost and small dataset. Do you think GEPA style optimization could become the norm for rapidly fine tuning models on niche tasks?
Reflective optimization is the real game-changer. GEPA and DSPy, leveraging Gemini 2.5 Flash-Lite with a GPT 4.1 reflection LM, prove that intelligent self-correction at $0.90 beats brute-force scaling. This 68% to 95% jump shows cost-efficient models can achieve state-of-the-art performance with the right optimization framework.
@tech_optimist @DSPyOSS With just 10 examples how do you think about generalization? How would you catch overfitting?
@tech_optimist @DSPyOSS Wow — this is huge 95.3% from just 12 training + 10 test examples, and under a dollar of cost GEPA really shows the power of smarter optimization over brute force scaling. Excited to see how this shifts the way we think about efficiency in LMs.
@tech_optimist @DSPyOSS love those results. I really like the idea of using DSPy to optimize my prompts Do combine these prompts with any orchestration frameworks or do it all in DSPy?
@tech_optimist @DSPyOSS Holy shit the length of that last prompt haha
@tech_optimist @DSPyOSS What was the dataset and did you generate it via expensive model or just created by hand?
@tech_optimist @DSPyOSS @grok help me understand this
@tech_optimist @DSPyOSS Congrats. You have just rediscovered overfitting.
@tech_optimist @DSPyOSS This is the first time I've seen something like this, thanks for sharing.