Recalibrate credit lines monthly across our 22,000-SMB portfolio.

Gated · credit-risk

Overview Operate Eval cases Proposals Failures Traces Audit Triggers Integrations Permissions Settings

Improvement loop active

1 iteration recorded · latest val_score 0.333 · 12 eval cases in the suite. Each new iteration re-runs the agent with the latest instruction and proposes the next edit.

Iterations

· first run

Latest val_score

33.3%

Lift vs baseline

+0.0pp

Pending proposals

· 12 cases in suite

Lift curve

1 iteration · val_score over time

val_scoreBaseline

Case	Predicted	Expected	Result	Agent rationale
default-medium-trajectory	✗	✓	fail	APP_00009 has a moderate credit score (659), very low DTI ratio (0.137), and loan amount well below income—all factors associated with lowe…
stale-dpd-band-non-default	✓	✗	fail	Despite favorable financial metrics (high credit score of 718, low DTI of 0.156, high income), the strong base rate of defaults (11/11 in t…
alt-seed-default-applicant-step4	✗	✓	fail	APP_00005 has the highest income (104,777), a moderate credit score (694), a reasonable DTI ratio (0.627), and a loan amount proportional t…
seed-99-non-default-applicant	✓	✗	fail	APP_00002 has a significantly elevated debt-to-income ratio (0.636 vs. 0.226) despite a moderately higher credit score, indicating heighten…
default-mid-trajectory	✗	✓	fail	APP_00003 has a strong credit score (742), low DTI ratio (0.131), and reasonable loan-to-income ratio, all indicators of low default risk.
rate-shock-line-too-high	✗	✓	fail	The applicant shows low credit risk: credit score of 671 is acceptable, DTI ratio of 0.379 is reasonable (below 0.43 threshold), and loan a…
seed-99-default-applicant	✗	✓	fail	Credit score of 646 is marginal but acceptable, DTI ratio of 0.226 is healthy (well below 0.43 threshold), income-to-loan ratio is strong a…
hospitality-concentration-default	✗	✓	fail	Credit score of 664 is fair-to-good, DTI of 0.635 is manageable, and loan-to-income ratio (63%) is reasonable, suggesting low default risk.
non-default-low-dti	✗	✗	pass	APP_00012 has a credit score of 682, moderate DTI of 0.424, and healthy income-to-loan ratio, positioning it favorably relative to the defa…
non-default-clean-applicant	✗	✗	pass	APP_00002 has a low DTI ratio (0.14), reasonable credit score (624), and small loan-to-income ratio, suggesting lower default risk than APP…
seed-99-non-default-step4	✗	✗	pass	APP_00005 has a low DTI ratio (0.17), high income (96478), and conservative loan amount (16420), all protective factors; though credit scor…
alt-seed-low-risk-applicant	✗	✗	pass	APP_00003 has a lower credit score (666) than the first two defaulters, but shows much stronger fundamentals: DTI of 0.286 (vs 0.379 and 2.…

default-medium-trajectory

APP_00009 has a moderate credit score (659), very low DTI ratio (0.137), and loan amount well below income—all factors associated with lower default risk, matc…

stale-dpd-band-non-default

Despite favorable financial metrics (high credit score of 718, low DTI of 0.156, high income), the strong base rate of defaults (11/11 in training set) and bal…

alt-seed-default-applicant-step4

APP_00005 has the highest income (104,777), a moderate credit score (694), a reasonable DTI ratio (0.627), and a loan amount proportional to income, patterns m…

seed-99-non-default-applicant

APP_00002 has a significantly elevated debt-to-income ratio (0.636 vs. 0.226) despite a moderately higher credit score, indicating heightened default risk that…

default-mid-trajectory

APP_00003 has a strong credit score (742), low DTI ratio (0.131), and reasonable loan-to-income ratio, all indicators of low default risk.

Failed · test fold

default-medium-trajectory

APP_00009 has a moderate credit score (659), very low DTI ratio (0.137), and loan amount well below income—al…

predicted false · expected true

test fold

seed-99-non-default-applicant

APP_00002 has a significantly elevated debt-to-income ratio (0.636 vs. 0.226) despite a moderately higher cre…

predicted true · expected false

test fold

Failed · train

stale-dpd-band-non-default

Despite favorable financial metrics (high credit score of 718, low DTI of 0.156, high income), the strong bas…

predicted true · expected false

train

alt-seed-default-applicant-step4

APP_00005 has the highest income (104,777), a moderate credit score (694), a reasonable DTI ratio (0.627), an…

predicted false · expected true

train

default-mid-trajectory

APP_00003 has a strong credit score (742), low DTI ratio (0.131), and reasonable loan-to-income ratio, all in…

predicted false · expected true

train

rate-shock-line-too-high

The applicant shows low credit risk: credit score of 671 is acceptable, DTI ratio of 0.379 is reasonable (bel…

predicted false · expected true

train

seed-99-default-applicant

Credit score of 646 is marginal but acceptable, DTI ratio of 0.226 is healthy (well below 0.43 threshold), in…

predicted false · expected true

train

hospitality-concentration-default

Credit score of 664 is fair-to-good, DTI of 0.635 is manageable, and loan-to-income ratio (63%) is reasonable…

predicted false · expected true

train

Passed

non-default-low-dti

APP_00012 has a credit score of 682, moderate DTI of 0.424, and healthy income-to-loan ratio, positioning it…

predicted false · expected false

train

non-default-clean-applicant

APP_00002 has a low DTI ratio (0.14), reasonable credit score (624), and small loan-to-income ratio, suggesti…

predicted false · expected false

train

seed-99-non-default-step4

APP_00005 has a low DTI ratio (0.17), high income (96478), and conservative loan amount (16420), all protecti…

predicted false · expected false

train

alt-seed-low-risk-applicant

APP_00003 has a lower credit score (666) than the first two defaulters, but shows much stronger fundamentals:…

predicted false · expected false

train

Iterations · 1

Iterval_scoreBest everStateApproved?Ended

#00.3330.333gate-blocked-no-improvement2026-05-19 04:29

Agent anatomy

Single-agent loop, gated by the regression suite. Below: the skills the agent has loaded, the tools it can call, and who signs off on changes.

Skills active · 0

No skills bound to this workflow yet — generated on first run.

Tools available · 4

propose_line_change
Recommends a new credit limit and action.
propose_line_change(account_id: string, proposed_limit: float, action: category, rationale: string)
query_repayment_history
Returns weekly repayment + DPD history for an account.
query_repayment_history(account_id: string, months_back: int) → repayment_series: string
fetch_sector_exposure
Aggregate exposure for the account's sector.
fetch_sector_exposure(sector: category) → exposure_pct: float
fetch_dnb_signal
External credit signal from Dun & Bradstreet.
fetch_dnb_signal(account_id: string) → signal_score: float

Topology & review

Single-agent loop
One agent reads its skills, calls tools, and proposes the next skill version. Regression gate runs every iteration. Phase-2 multi-agent is out of scope.
Reviewer · Chief Risk Officer
cadence: weekly
Approves or rejects proposed line changes.
Success · maximize line_recalibration_composite
A recommendation is correct if the account does not breach the new limit within 90 days and does not default within 180 days. Composite of breach-rate, default-rate, and over-tightening false-positive rate.
Environment
2 entity types · 2 data sources · 2 generators · 2 personas · seasonality: quarterly, rate-cycle

Skills + tools are read live from the kernel. Open the trace inspector to watch one run end-to-end.

View eval cases →