oss / genetic-pareto

GEPA

Parallel, agentic, verifier-aware.

parallelism
FlashEvolve-style fanout
candidate x seed jobs / cache-aware rollout workers
proposer
Codex + workspace
edits prompt/program surface from traces and direct parents
task shape
agentic systems
pipelines, ReAct, coding agents, and multi-agent loops
signal
verifier-aware
model output + verifier result + prompt diff in one packet
> run modelgepa
P0 = seed_from_program(workspace)
frontier = evaluate(P0, train, rollout_cache)

while budget remains:
  parents = sample(frontier)
  candidates = proposer.reflect(workspace, parents, failures)
  survivors = gate(minibatch → full_train, candidates)
  frontier = pareto_update(frontier, survivors)

best = heldout_select(frontier)
write_evidence_packet(best)
surfaceprompt + context
evaluationtrain seeds, heldout selector
statecandidate frontier
artifactbest prompts
01
initialize codex proposer
02
deploy rollouts
03
terminate and verify
04
update pareto frontier
05
dissect traces and side information
06
propose mutations and merges

genetic evolutionary prompt optimization

How GEPA runs

GEPA drives a task container over HTTP: read the mutable program at GET /program, fan out POST /rollout through the rollout queue for candidate × seed evaluation, and update the frontier with reward + trace evidence. The workspace and frontier store hold run evidence and per-seed wins; a coding-agent proposer edits the prompt surface from failures.

-- synth gepa runtime+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+containeroptimizer runtimeoperator console/ container /GET /programPOST /rollout/ policy + verifier /policy callverifier scorereward + trace/ rollout queue /candidate × seed jobsstaged rollout batchescache hits / misses/ frontier store /per-seed wins · parents · heldoutP0P1P2P349/56/ workspace /workspace.sqlitecached rollouts · tracesparents · prompt diffs/ codex proposer /tool callsworkspace + queue/ evidence packet /best candidatediff · usage · traces/ run console /events · usagecandidate scoresfrontier livebest .713 · heldout .565event feedaccepted · P2frontier · +2 seedsfinished · 430 rolloutsGET /programprogramPOST /rolloutreward + traceupdatesample parentsread toolsenqueue candidatesselect heldoutstreampublish
01

task container

GEPA works for any app that can be run end to end behind a /rollout post request.

02

proposer pipeline

A Codex instance reads trace data, actionable side info, and verification results before mutating pareto frontier candidates or merging multiple together.

03

workspace and frontier store

All results and derived evidence is stored in local dbs and filesystem to simplify the runtime and let humans easily debug.

* public sample run with truncated evidence

01 / GEPA launch blog scope

Banking77 prompt evolution

A weak generic classifier prompt evolves into label-specific decision policy for 77 banking intents.

beforeClassify the user's banking intent. Return the best matching label.
afterResolve exact banking intent by preferring specific product/action labels, handling transfers vs card payments separately, and using borderline rules before returning one label.
open source blog section >
frontier
49/56
best train
0.625 (+0.054)
best heldout
0.729 (+0.050)
rollouts
1204
candidate index vs frontier and score
orange coveragewhite traingray heldout
0.500.720.950101 gepa_abaa64cbe339 / 01 / frontier 32/56 / train 0.571 / heldout 0.6790202 gepa_766c85812fa5 / 02 / frontier 36/56 / train 0.571 / heldout 0.6790303 gepa_db02a5afa12c / 03 / frontier 39/56 / train 0.571 / heldout 0.6790404 gepa_10b563483280 / 04 / frontier 44/56 / train 0.571 / heldout 0.6790505 gepa_337d4f765e7e / 05 / frontier 44/56 / train 0.571 / heldout 0.6790606 gepa_874c96fbe2f7 / 06 / frontier 45/56 / train 0.571 / heldout 0.6790707 gepa_9ee8822bd977 / 07 / frontier 45/56 / train 0.571 / heldout 0.6790808 gepa_c48def569989 / 08 / frontier 47/56 / train 0.589 / heldout 0.7000909 gepa_7442eae959a4 / 09 / frontier 49/56 / train 0.625 / heldout 0.729
01gepa_abaa64cbe339+32 seeds32/56 frontiertrain 0.571heldout 0.6790.110M
02gepa_766c85812fa5+4 seeds36/56 frontiertrain -heldout -0.128M
03gepa_db02a5afa12c+3 seeds39/56 frontiertrain 0.482heldout 0.5860.270M
04gepa_10b563483280+5 seeds44/56 frontiertrain 0.536heldout 0.5860.405M
05gepa_337d4f765e7e+0 seeds44/56 frontiertrain -heldout -0.422M
06gepa_874c96fbe2f7+1 seeds45/56 frontiertrain -heldout -0.439M
07gepa_9ee8822bd977+0 seeds45/56 frontiertrain -heldout -0.456M
08gepa_c48def569989+2 seeds47/56 frontiertrain 0.589heldout 0.7000.596M
09gepa_7442eae959a4+2 seeds49/56 frontiertrain 0.625heldout 0.7290.731M
candidatespolicy meta-llama/llama-3.1-8b-instructproposer gpt-5.4-mini56 train / 140 heldout · 1204 rollouts
rolecandidateminitrainheldoutlift
best0.6070.6250.729+0.050
#20.6430.5890.700+0.021
seed-0.5710.679+0.000
#50.4640.4820.586-0.093
#40.5360.5360.586-0.093

pareto frontier coverage

GEPA keeps candidates that cover different hard train seeds. Orange squares are new seeds this candidate added to the frontier; click any square to inspect the verifier result.

49/56
covered
train seeds
gepa_abaa64cbe339
+32 seeds / wins 32
32/56
gepa_766c85812fa5
+4 seeds / wins 16
36/56
gepa_db02a5afa12c
+3 seeds / wins 27
39/56
gepa_10b563483280
+5 seeds / wins 30
44/56
gepa_337d4f765e7e
+0 seeds / wins 15
44/56
gepa_874c96fbe2f7
+1 seeds / wins 19
45/56
gepa_9ee8822bd977
+0 seeds / wins 10
45/56
gepa_c48def569989
+2 seeds / wins 33
47/56
gepa_7442eae959a4
+2 seeds / wins 35
49/56
gepa_7442eae959a4
mini 0.607
train 0.625
heldout 0.729
adds 2 new frontier seeds; wins 35 train seeds
49/56 covered
solid new frontier seed
gray already-covered win
black miss / no rollout for that seed