GEPA — Genetic Evolutionary Prompt Optimization

genetic evolutionary prompt optimization

How GEPA runs

GEPA drives a task container over HTTP: read the mutable program at GET /program, fan out POST /rollout through the rollout queue for candidate × seed evaluation, and update the frontier with reward + trace evidence. The workspace and frontier store hold run evidence and per-seed wins; a coding-agent proposer edits the prompt surface from failures.

01

task container

GEPA works for any app that can be run end to end behind a /rollout post request.

02

proposer pipeline

A Codex instance reads trace data, actionable side info, and verification results before mutating pareto frontier candidates or merging multiple together.

03

workspace and frontier store

All results and derived evidence is stored in local dbs and filesystem to simplify the runtime and let humans easily debug.

* public sample run with truncated evidence

01 / GEPA launch blog scope

Banking77 prompt evolution

A weak generic classifier prompt evolves into label-specific decision policy for 77 banking intents.

beforeClassify the user's banking intent. Return the best matching label.

afterResolve exact banking intent by preferring specific product/action labels, handling transfers vs card payments separately, and using borderline rules before returning one label.

open source blog section >

frontier

49/56

best train

0.625 (+0.054)

best heldout

0.729 (+0.050)

rollouts

1204

candidate index vs frontier and score

orange coveragewhite traingray heldout

01gepa_abaa64cbe339+32 seeds32/56 frontiertrain 0.571heldout 0.6790.110M

02gepa_766c85812fa5+4 seeds36/56 frontiertrain -heldout -0.128M

03gepa_db02a5afa12c+3 seeds39/56 frontiertrain 0.482heldout 0.5860.270M

04gepa_10b563483280+5 seeds44/56 frontiertrain 0.536heldout 0.5860.405M

05gepa_337d4f765e7e+0 seeds44/56 frontiertrain -heldout -0.422M

06gepa_874c96fbe2f7+1 seeds45/56 frontiertrain -heldout -0.439M

07gepa_9ee8822bd977+0 seeds45/56 frontiertrain -heldout -0.456M

08gepa_c48def569989+2 seeds47/56 frontiertrain 0.589heldout 0.7000.596M

09gepa_7442eae959a4+2 seeds49/56 frontiertrain 0.625heldout 0.7290.731M

candidatespolicy meta-llama/llama-3.1-8b-instructproposer gpt-5.4-mini56 train / 140 heldout · 1204 rollouts

role	mini	train	heldout	lift
best	0.607	0.625	0.729	+0.050
#2	0.643	0.589	0.700	+0.021
seed	-	0.571	0.679	+0.000
#5	0.464	0.482	0.586	-0.093
#4	0.536	0.536	0.586	-0.093

pareto frontier coverage

GEPA keeps candidates that cover different hard train seeds. Orange squares are new seeds this candidate added to the frontier; click any square to inspect the verifier result.

49/56

covered

train seeds

gepa_abaa64cbe339

+32 seeds / wins 32

32/56

gepa_766c85812fa5

+4 seeds / wins 16

36/56

gepa_db02a5afa12c

+3 seeds / wins 27

39/56

gepa_10b563483280

+5 seeds / wins 30

44/56

gepa_337d4f765e7e

+0 seeds / wins 15

44/56

gepa_874c96fbe2f7

+1 seeds / wins 19

45/56

gepa_9ee8822bd977

+0 seeds / wins 10

45/56

gepa_c48def569989

+2 seeds / wins 33

47/56

gepa_7442eae959a4

+2 seeds / wins 35

49/56

gepa_7442eae959a4

mini 0.607

train 0.625

heldout 0.729

adds 2 new frontier seeds; wins 35 train seeds

49/56 covered

solid new frontier seed

gray already-covered win

black miss / no rollout for that seed

back to prompt-opt >read scoped blog >