01 / GEPA launch blog scope
Banking77 prompt evolution
A weak generic classifier prompt evolves into label-specific decision policy for 77 banking intents.
| role | candidate | mini | train | heldout | lift | |
|---|---|---|---|---|---|---|
| best | 0.607 | 0.625 | 0.729 | +0.050 | ||
| #2 | 0.643 | 0.589 | 0.700 | +0.021 | ||
| seed | - | 0.571 | 0.679 | +0.000 | ||
| #5 | 0.464 | 0.482 | 0.586 | -0.093 | ||
| #4 | 0.536 | 0.536 | 0.586 | -0.093 |
pareto frontier coverage
GEPA keeps candidates that cover different hard train seeds. Orange squares are new seeds this candidate added to the frontier; click any square to inspect the verifier result.

