Imaginate - Two-Tier Resilience: Escalation Ladder Meets OpenRouter Per-Route Fallbacks | Nick Stradford

Overview

Imaginate's agent has to keep working through two very different failure modes. The first is transport failure: a model provider rate-limits us, times out, or goes down mid-request. The second is quality failure: the model returned successfully, but the output is empty, full of "TODO" placeholders, or wrote files without verifying them. Both failures happen in production. Neither should kill an agent run.

The runtime handles them with two cooperating mechanisms. OpenRouter's per-route models parameter handles transport failures inside a single call — if the primary model errors with something OpenRouter knows how to retry, it walks a curated fallback list before returning. The agent's own escalation ladder handles quality failures between calls — if the executor's output looks wrong (or the whole route exhausted), runAgent advances to the next ladder rung, which is itself another route with its own fallback list. The result is a two-tier resilience strategy that survives single-provider outages without retrying the same failed model forever.

Explore the source code on GitHub.

View on GitHub

Architecture

The executor ladder: three rungs of progressively stronger models

The ladder is a flat ordered list of model specs registered at the platform layer. The default rung runs first; each subsequent rung is a different model — usually a stronger or more reliable one — that the agent only reaches if the previous rung's output failed quality checks or the route itself errored:

export const MODEL_REGISTRY = {
  planner: { provider: "openrouter", model: env.MODEL_PLANNER },
  executorDefault: {
    provider: "openrouter",
    model: env.MODEL_EXECUTOR_DEFAULT,
  },
  executorFallback1: {
    provider: "openrouter",
    model: env.MODEL_EXECUTOR_FALLBACK_1,
  },
  executorFallback2: {
    provider: "openrouter",
    model: env.MODEL_EXECUTOR_FALLBACK_2,
  },
} satisfies Record<string, ModelSpec>;

export const EXECUTOR_LADDER: readonly ModelSpec[] = [
  MODEL_REGISTRY.executorDefault,
  MODEL_REGISTRY.executorFallback1,
  MODEL_REGISTRY.executorFallback2,
] as const;

The model gateway exposes the ladder to the agent core through listExecutorModelIds(). The domain layer doesn't know what those models are, only that there are N rungs and that climbing them is a real escalation, not a retry of the same model.

Per-route OpenRouter fallback lists

Each rung is itself a route: a primary spec plus an ordered list of fallback specs that OpenRouter walks left-to-right when the primary errors with a retryable failure (rate limit, downtime, moderation, context-length validation). The fallbacks are deliberately drawn from different providers so a single-provider outage cannot take down a whole rung:

// Per-route OpenRouter fallback lists. See
// docs/plans/open/openrouter-model-route-fallbacks.md for selection rationale.
// Order = preference; OpenRouter walks this list left-to-right when the
// primary errors with a retryable failure (rate limit, downtime, moderation,
// context-length validation). Cross-provider diversity is intentional —
// a single-provider outage should not take down a layer.
export const MODEL_ROUTES = {
  planner: {
    primary: MODEL_REGISTRY.planner,
    fallbacks: [spec("OPENAI_GPT_5_MINI"), spec("GROK_4_1_FAST")],
  },
  executorDefault: {
    primary: MODEL_REGISTRY.executorDefault,
    fallbacks: [spec("QWEN_3_CODER"), spec("DEEPSEEK_V3_2")],
  },
  executorFallback1: {
    primary: MODEL_REGISTRY.executorFallback1,
    fallbacks: [spec("CLAUDE_HAIKU_4_5"), spec("GROK_CODE_FAST_1")],
  },
  executorFallback2: {
    primary: MODEL_REGISTRY.executorFallback2,
    fallbacks: [spec("KIMI_K2_6"), spec("CLAUDE_OPUS_4_7")],
  },
} satisfies Record<string, ModelRoute>;

The model factory looks up the route for a given primary spec and passes the fallback slugs to OpenRouter when constructing the language model. OpenRouter's SDK accepts a models array on the model handle; it routes to the primary first, then transparently retries through the fallbacks before surfacing an error to us:

export function createModelProvider(
  config: ResolvedModelConfig,
  options?: CreateModelProviderOptions,
): LanguageModel {
  const factory = createOpenRouter({ apiKey: config.apiKey });
  const primarySlug = MODEL_IDS[config.model];
  if (options?.fallbackSlugs && options.fallbackSlugs.length > 0) {
    return factory(primarySlug, {
      models: [...options.fallbackSlugs],
    });
  }
  return factory(primarySlug);
}

This is the critical piece: by the time the AI SDK throws an error back to the agent, OpenRouter has already burned through the entire in-route fallback list. There is no point retrying the same primary at the application layer — the route is exhausted.

The escalation decision: when did the executor's output go wrong?

After each executor attempt, the domain layer decides whether to escalate. shouldEscalate is a pure function that takes the run state and the raw model result and returns either { escalate: false } or { escalate: true, reason }. The reasons are the codified ways an attempt can be "successful but wrong":

export const EscalateReason = {
  FinalizeFailed: "finalize:failed",
  FinalizePartial: "finalize:partial",
  EmptyOutput: "empty_output",
  StubLanguage: "stub_language",
  WroteWithoutVerify: "wrote_without_verify",
  NoWrites: "no_writes",
  Exception: "exception",
} as const;

export function shouldEscalate(
  runState: RunState,
  result: unknown,
): EscalateDecision {
  if (runState.finalOutput) {
    if (runState.finalOutput.status === "failed") {
      return { escalate: true, reason: EscalateReason.FinalizeFailed };
    }
    if (runState.finalOutput.status === "partial") {
      return { escalate: true, reason: EscalateReason.FinalizePartial };
    }
    return { escalate: false };
  }

  const text = stepTextOf(result) || "";
  const lower = text.toLowerCase();
  if (!text.trim()) {
    return { escalate: true, reason: EscalateReason.EmptyOutput };
  }
  if (
    lower.includes("todo") ||
    lower.includes("placeholder") ||
    lower.includes("not implemented")
  ) {
    return { escalate: true, reason: EscalateReason.StubLanguage };
  }

  const wrote = Object.keys(runState.filesWritten).length > 0;
  const verified = runState.verification.some((v) => v.success);
  if (wrote && !verified) {
    return { escalate: true, reason: EscalateReason.WroteWithoutVerify };
  }
  if (!wrote) {
    return { escalate: true, reason: EscalateReason.NoWrites };
  }

  return { escalate: false };
}

This is intentionally a flat function with no I/O. It looks at the run state — what the executor wrote, what verifications it ran, what finalOutput it produced — and decides if the next ladder rung should take a swing. The reason becomes a runtime event so observers (the web UI's thoughts panel, the CLI's stdout, telemetry) can show why the agent moved to a stronger model.

Provider error classification: which exceptions are even worth escalating for?

When the model gateway throws — meaning OpenRouter's per-route fallbacks were exhausted and the failure surfaced — the agent classifies the error before deciding whether to climb the ladder. Some errors are not retryable at all and should short-circuit the whole run; others are inherent to that model and worth trying the next rung against:

export const PROVIDER_ERROR_RULES: ProviderErrorRule[] = [
  {
    category: "credit",
    retryable: false,
    prefix: "Provider account limit reached",
    needles: ["credit", "balance", "quota", "insufficient"],
  },
  {
    category: "rate_limit",
    retryable: true,
    prefix: "Provider rate limit exceeded",
    needles: ["rate limit", "429", "too many requests"],
  },
  {
    category: "auth",
    retryable: false,
    prefix: "Provider authentication failed",
    needles: ["unauthorized", "401", "api key", "authentication"],
  },
  {
    category: "timeout",
    retryable: true,
    prefix: "Provider timed out",
    needles: ["timeout", "etimedout"],
  },
  {
    category: "connection",
    retryable: true,
    prefix: "Provider connection error",
    needles: ["econnreset", "econnrefused", "enotfound", "network"],
  },
];

The classifier walks this list looking for needles in the lowercased error message and returns { category, retryable }. credit and auth failures are non-retryable — there is no point trying the next ladder rung if our OpenRouter account has no balance, because every rung uses the same account. rate_limit, timeout, and connection are retryable, so the ladder advances.

Tying it together: the executor loop in `runAgent`

The orchestration that ties the ladder, the route fallbacks, the escalation decision, and the error classifier together is in runAgent. It walks the ladder once. Each rung calls executeRun (which itself dispatches through OpenRouter's in-route fallbacks). On exception, the error is classified and the loop either continues (retryable) or breaks out (terminal). On a clean return, the escalation decision determines whether to advance:

const ladder = deps.modelGateway.listExecutorModelIds();

if (plan.requiresCoding) {
  // Each ladder rung is itself a route with OpenRouter fallback models
  // configured at the gateway. By the time `executeRun` throws a retryable
  // error here, OpenRouter has already exhausted the in-route fallback
  // list, so advancing this ladder means "the entire route failed,"
  // not "the primary model failed."
  for (let i = 0; i < ladder.length; i++) {
    const modelId = ladder[i];
    // ... describeModel + ExecutorAttemptStarted event ...

    const outcome = await executeRun({ /* ... */ modelId });
    stepsCount = outcome.stepsCount;

    if (outcome.error) {
      const classified = deps.modelGateway.classifyError(outcome.error);
      lastError = outcome.error;
      await deps.eventSink.emit({
        type: AgentRuntimeEventType.ExecutorAttemptFailed,
        attempt: i + 1,
        category: classified.category,
        retryable: classified.retryable,
      });
      if (!classified.retryable) {
        break;
      }
      continue;
    }

    if (!outcome.escalated) {
      await deps.eventSink.emit({
        type: AgentRuntimeEventType.ExecutorAccepted,
        attempt: i + 1,
      });
      break;
    }

    await deps.eventSink.emit({
      type: AgentRuntimeEventType.ExecutorEscalated,
      attempt: i + 1,
      reason: outcome.reason,
    });
  }
}

Three terminating conditions: an ExecutorAccepted event (success), a non-retryable classified error (terminal), or running off the end of the ladder (escalation exhausted). Every transition emits a runtime event, so the web UI's thoughts panel and the CLI both render the same story: "tried executorDefault, output had stub language, escalated to executorFallback1, accepted."

The two tiers, side by side

The full attempt flow for a single run looks like this:

runAgent
 └─ for each rung in EXECUTOR_LADDER       ← outer: quality escalation
     └─ executeRun(modelId = rung)
         └─ modelGateway.generateText
             └─ OpenRouter route walk      ← inner: transport fallback
                 ├─ try primary spec
                 ├─ on retryable error → try fallback[0]
                 ├─ on retryable error → try fallback[1]
                 └─ all exhausted → throw
         ← either returns a result OR throws
     ← shouldEscalate(runState, result) on success
     ← classifyError(err)               on throw

The inner tier is invisible to the agent core — it sees only "the call worked" or "the call threw an exhausted-route error." The outer tier is invisible to OpenRouter — it has no idea the agent will try a stronger model if the executor produced "TODO" placeholders. Each tier is doing exactly the failure-mode it knows how to do.

Difficult Parts

Not double-retrying the primary model

The naïve design is "retry the failing primary at the application layer, then escalate if it keeps failing." With OpenRouter's per-route fallbacks already configured, that would mean the agent retries executorDefault (which silently became QWEN_3_CODER after the primary errored), then advances to executorFallback1 (which is a different primary). That's not a retry, it's a third model, and the failure semantics get muddy fast. The fix was a comment in run-agent.ts that names this explicitly: by the time executeRun throws a retryable error, OpenRouter has already exhausted the in-route fallback list. The application layer's job is no longer "retry the same model"; it's "advance to a different rung." The for loop walks the ladder exactly once.

Choosing the right boundary for retryability

The provider classifier intentionally lives in shared/errors/, not in the model adapter. The agent's domain logic asks the model gateway port classifyError(err) and gets back { category, retryable }; the gateway delegates to the classifier. This keeps the domain free of provider-specific string matching, and lets a future non-OpenRouter adapter ship its own classifier without changing the loop. The classification is also why credit and auth are marked non-retryable — climbing the ladder won't fix a missing API key, and the agent should fail fast and surface the user-facing message instead of burning two more attempts.

Encoding "successful but wrong" as a pure function

shouldEscalate reads only RunState and the raw model result. It doesn't fetch from a database, doesn't know what model was used, doesn't talk to the sandbox. That purity is what makes it testable in isolation (and decisions.test.ts does test every reason). The hard part was deciding what counts as "wrong." The current rules — empty output, stub language, wrote without verifying, didn't write at all, finalize status failed or partial — came from watching real runs in the web app and noting which outputs the user immediately re-prompted to fix. Each new failure mode that survives the next quarter of usage gets added to EscalateReason, with a code change rather than a config tweak, because the right response is almost always "rerun on a stronger model" rather than "ignore."

Cross-provider diversity in the fallback lists

The route fallback selection is deliberately cross-provider. executorDefault's fallbacks are Qwen and DeepSeek; executorFallback1's are Anthropic Haiku and Grok; executorFallback2's are Kimi and Claude Opus. The reason: a single-provider outage (Anthropic's API down for an hour, OpenAI throttling our org) should not cascade into "every rung of the ladder fails the same way." Picking fallbacks that share a provider with the primary would defeat the point of having fallbacks at all. The selection rationale lives in docs/plans/open/openrouter-model-route-fallbacks.md so that future model swaps are deliberate rather than accidental.

Explore the source code on GitHub.

View on GitHub

Imaginate - Two-Tier Resilience: Escalation Ladder Meets OpenRouter Per-Route Fallbacks

Overview

Architecture

The executor ladder: three rungs of progressively stronger models

Per-route OpenRouter fallback lists

The escalation decision: when did the executor's output go wrong?

Provider error classification: which exceptions are even worth escalating for?

Tying it together: the executor loop in runAgent

The two tiers, side by side

Difficult Parts

Not double-retrying the primary model

Choosing the right boundary for retryability

Encoding "successful but wrong" as a pure function

Cross-provider diversity in the fallback lists

Tying it together: the executor loop in `runAgent`