Read-only demo. Approve, reject, deploy, and iteration actions are disabled. Self-host from GitHub.
‹ Workflows

Recalibrate credit lines monthly across our 22,000-SMB portfolio.

Open operator view ↗

Gated · credit-risk

Improvement loop active

1 iteration recorded · latest val_score 0.333 · 12 eval cases in the suite. Each new iteration re-runs the agent with the latest instruction and proposes the next edit.

Iterations
1
· first run
Latest val_score
33.3%
Lift vs baseline
+0.0pp
Pending proposals
0
· 12 cases in suite
Lift curve
1 iteration · val_score over time
33%33%33%33%33%iter 0baseline (iter 0)val_score: 33.3 @ iter 0
val_scoreBaseline
Per-case agent output · iteration #012 cases · failed first
CasePredictedExpectedResultAgent rationale
default-medium-trajectoryfailAPP_00009 has a moderate credit score (659), very low DTI ratio (0.137), and loan amount well below income—all factors associated with lowe…
stale-dpd-band-non-defaultfailDespite favorable financial metrics (high credit score of 718, low DTI of 0.156, high income), the strong base rate of defaults (11/11 in t…
alt-seed-default-applicant-step4failAPP_00005 has the highest income (104,777), a moderate credit score (694), a reasonable DTI ratio (0.627), and a loan amount proportional t…
seed-99-non-default-applicantfailAPP_00002 has a significantly elevated debt-to-income ratio (0.636 vs. 0.226) despite a moderately higher credit score, indicating heighten…
default-mid-trajectoryfailAPP_00003 has a strong credit score (742), low DTI ratio (0.131), and reasonable loan-to-income ratio, all indicators of low default risk.
rate-shock-line-too-highfailThe applicant shows low credit risk: credit score of 671 is acceptable, DTI ratio of 0.379 is reasonable (below 0.43 threshold), and loan a…
seed-99-default-applicantfailCredit score of 646 is marginal but acceptable, DTI ratio of 0.226 is healthy (well below 0.43 threshold), income-to-loan ratio is strong a…
hospitality-concentration-defaultfailCredit score of 664 is fair-to-good, DTI of 0.635 is manageable, and loan-to-income ratio (63%) is reasonable, suggesting low default risk.
non-default-low-dtipassAPP_00012 has a credit score of 682, moderate DTI of 0.424, and healthy income-to-loan ratio, positioning it favorably relative to the defa…
non-default-clean-applicantpassAPP_00002 has a low DTI ratio (0.14), reasonable credit score (624), and small loan-to-income ratio, suggesting lower default risk than APP…
seed-99-non-default-step4passAPP_00005 has a low DTI ratio (0.17), high income (96478), and conservative loan amount (16420), all protective factors; though credit scor…
alt-seed-low-risk-applicantpassAPP_00003 has a lower credit score (666) than the first two defaulters, but shows much stronger fundamentals: DTI of 0.286 (vs 0.379 and 2.…
default-medium-trajectory
APP_00009 has a moderate credit score (659), very low DTI ratio (0.137), and loan amount well below income—all factors associated with lower default risk, matc…
stale-dpd-band-non-default
Despite favorable financial metrics (high credit score of 718, low DTI of 0.156, high income), the strong base rate of defaults (11/11 in training set) and bal…
alt-seed-default-applicant-step4
APP_00005 has the highest income (104,777), a moderate credit score (694), a reasonable DTI ratio (0.627), and a loan amount proportional to income, patterns m…
seed-99-non-default-applicant
APP_00002 has a significantly elevated debt-to-income ratio (0.636 vs. 0.226) despite a moderately higher credit score, indicating heightened default risk that…
default-mid-trajectory
APP_00003 has a strong credit score (742), low DTI ratio (0.131), and reasonable loan-to-income ratio, all indicators of low default risk.
Failed · test fold
2
default-medium-trajectory
APP_00009 has a moderate credit score (659), very low DTI ratio (0.137), and loan amount well below income—al…
predicted false · expected true
test fold
seed-99-non-default-applicant
APP_00002 has a significantly elevated debt-to-income ratio (0.636 vs. 0.226) despite a moderately higher cre…
predicted true · expected false
test fold
Failed · train
6
stale-dpd-band-non-default
Despite favorable financial metrics (high credit score of 718, low DTI of 0.156, high income), the strong bas…
predicted true · expected false
train
alt-seed-default-applicant-step4
APP_00005 has the highest income (104,777), a moderate credit score (694), a reasonable DTI ratio (0.627), an…
predicted false · expected true
train
default-mid-trajectory
APP_00003 has a strong credit score (742), low DTI ratio (0.131), and reasonable loan-to-income ratio, all in…
predicted false · expected true
train
rate-shock-line-too-high
The applicant shows low credit risk: credit score of 671 is acceptable, DTI ratio of 0.379 is reasonable (bel…
predicted false · expected true
train
seed-99-default-applicant
Credit score of 646 is marginal but acceptable, DTI ratio of 0.226 is healthy (well below 0.43 threshold), in…
predicted false · expected true
train
hospitality-concentration-default
Credit score of 664 is fair-to-good, DTI of 0.635 is manageable, and loan-to-income ratio (63%) is reasonable…
predicted false · expected true
train
Passed
4
non-default-low-dti
APP_00012 has a credit score of 682, moderate DTI of 0.424, and healthy income-to-loan ratio, positioning it…
predicted false · expected false
train
non-default-clean-applicant
APP_00002 has a low DTI ratio (0.14), reasonable credit score (624), and small loan-to-income ratio, suggesti…
predicted false · expected false
train
seed-99-non-default-step4
APP_00005 has a low DTI ratio (0.17), high income (96478), and conservative loan amount (16420), all protecti…
predicted false · expected false
train
alt-seed-low-risk-applicant
APP_00003 has a lower credit score (666) than the first two defaulters, but shows much stronger fundamentals:…
predicted false · expected false
train

Iterations · 1

Iterval_scoreBest everStateApproved?Ended
#00.3330.333gate-blocked-no-improvement2026-05-19 04:29

Agent anatomy

Single-agent loop, gated by the regression suite. Below: the skills the agent has loaded, the tools it can call, and who signs off on changes.

Skills active · 0
No skills bound to this workflow yet — generated on first run.
Tools available · 4
  • propose_line_change
    Recommends a new credit limit and action.
    propose_line_change(account_id: string, proposed_limit: float, action: category, rationale: string)
  • query_repayment_history
    Returns weekly repayment + DPD history for an account.
    query_repayment_history(account_id: string, months_back: int) → repayment_series: string
  • fetch_sector_exposure
    Aggregate exposure for the account's sector.
    fetch_sector_exposure(sector: category) → exposure_pct: float
  • fetch_dnb_signal
    External credit signal from Dun & Bradstreet.
    fetch_dnb_signal(account_id: string) → signal_score: float
Topology & review
  • Single-agent loop
    One agent reads its skills, calls tools, and proposes the next skill version. Regression gate runs every iteration. Phase-2 multi-agent is out of scope.
  • Reviewer · Chief Risk Officer
    cadence: weekly
    Approves or rejects proposed line changes.
  • Success · maximize line_recalibration_composite
    A recommendation is correct if the account does not breach the new limit within 90 days and does not default within 180 days. Composite of breach-rate, default-rate, and over-tightening false-positive rate.
  • Environment
    2 entity types · 2 data sources · 2 generators · 2 personas · seasonality: quarterly, rate-cycle

Skills + tools are read live from the kernel. Open the trace inspector to watch one run end-to-end.

View eval cases →