Fallback Routing for Coding Agent Model Calls

Last reviewed: 2026-05-29

Direct answer

Fallback routing for coding agent model calls means your agent has a ranked list of model endpoints and automatically moves down that list when the current endpoint returns an error, exhausts its quota, or takes too long to respond. Instead of crashing the agent run or requiring manual intervention, the agent retries the same request against the next candidate endpoint in the priority order.

The minimal working shape is:

A priority-ordered list of model endpoints the agent is allowed to use.
A trigger condition — the specific HTTP error codes, timeout signals, or quota signals that indicate the current endpoint is unavailable.
A retry policy — how many attempts per endpoint, what backoff to apply, and when to give up and fail the run explicitly rather than silently degrade.
A log record written after each fallback event so you can review which endpoint actually served the request.

For coding agents that use an OpenAI-compatible gateway such as CometAPI, the endpoint contract (base URL format, Authorization header pattern, and chat completions request shape) is the same across the priority list. You change the model identifier or the base URL; you do not change the request builder. Verify the exact endpoint paths, supported model identifiers, and response field names in the CometAPI docs before wiring your fallback list — see the sources section below for the current reference URLs.

The existing guide Route Coding Agent Model Calls Without Endpoint Drift covers how to keep a single gateway wired correctly over time. This article focuses on what happens after the primary model call fails and how to design the routing layer that catches that failure.

For broader release checks, see AI Coding Agent Setup, Security, and Model Routing .

Who this is for

This guide is for engineers who:

Run coding agents in CI pipelines or scheduled automation where an unhandled model error stops the entire job.
Operate multiple coding agents (Codex, Claude Code, Cursor, OpenCode, or similar tools) that all call a shared model gateway.
Have already set up a gateway but have not yet defined what happens when the primary model is unavailable.
Want a clear, auditable record of which model endpoint served each agent run so they can correlate quality regressions with endpoint changes.

If you have not yet connected a coding agent to a model gateway, start with the setup guide linked above and return here once the primary route is working.

Key takeaways

Define the fallback list explicitly in configuration, not in ad-hoc exception handling scattered across agent scripts.
Only trigger a fallback on clear endpoint-failure signals: HTTP 429 (rate limit or quota), 503 (service unavailable), 504 (gateway timeout), and connection-level timeouts. Do not fall back on 400 or 422 — those errors mean your request is malformed and the next endpoint will reject it for the same reason.
Apply exponential backoff with jitter before each retry attempt. A flat retry loop hammers a struggling endpoint and exhausts your fallback budget faster.
Write a structured fallback log entry after every switch event. The minimum fields are: run ID, timestamp, primary endpoint attempted, trigger error code, fallback endpoint selected, and whether the fallback attempt succeeded.
Keep the fallback list short: two to three endpoints is enough for most agent pipelines. A long list obscures the real problem and makes cost accounting difficult.
Test the fallback path in a controlled smoke test before relying on it in production. A fallback that has never been exercised is not a fallback — it is a guess.
Verify all endpoint paths, model IDs, auth header format, and error response shapes against the linked CometAPI documentation before treating them as correct. These details change and this article cannot substitute for the current docs.

Fallback routing design

Priority list structure

Express the fallback list as an ordered array in your agent configuration file or environment. Each entry needs at minimum:

base_url: the gateway base URL for that endpoint (verify current format in the CometAPI docs).
model: the model identifier string (verify current identifiers in the models reference).
priority: an integer, lower is tried first.
timeout_seconds: the per-request timeout before treating the call as failed.

Keep the list in a dedicated config section rather than inlined into request-building code. When the list is in configuration, you can update endpoint order, add a new candidate, or temporarily disable an entry without touching agent logic.

Trigger conditions

Not every error should trigger a fallback. Apply this classification before deciding whether to move to the next endpoint:

Fallback-eligible errors:

HTTP 429 with a quota or rate-limit body (confirm the exact response body field name in the CometAPI docs — the field name varies across gateways).
HTTP 503 Service Unavailable.
HTTP 504 Gateway Timeout.
Network-level connection timeout (the TCP connection did not complete within timeout_seconds).
HTTP 500 only if the body contains a clear infrastructure error message rather than a model-level error.

Do not fall back — fix the request instead:

HTTP 400 Bad Request: your request body is invalid for this endpoint. Fix the request shape before retrying anywhere.
HTTP 401 or 403: authentication or authorization failure. A fallback to a different model will not resolve a credential problem.
HTTP 422 Unprocessable Entity: the request structure is valid JSON but semantically invalid for this endpoint. Check field names against the chat completions reference.
HTTP 404: the model identifier or endpoint path does not exist. Verify the path and model ID in the docs before retrying.

Retry and backoff policy

For each endpoint in the list before moving to the next:

Attempt the request once.
On a fallback-eligible error, wait: base_delay * 2^attempt_number + random_jitter seconds. A reasonable starting point is a 1-second base delay with up to 500 ms of jitter. Verify whether the gateway returns a Retry-After header on 429 responses and honour it if present — check the CometAPI docs to confirm whether this header is supported.
Retry up to max_retries_per_endpoint times on the same endpoint before moving to the next. One or two retries per endpoint is usually sufficient.
If all endpoints in the priority list are exhausted, fail the run explicitly with a structured error record. Do not silently return an empty or partial result.

Instruction file considerations

Coding agents that read instruction files (AGENTS.md, CLAUDE.md, or equivalent) at startup may have model or endpoint configuration embedded there. If your fallback list lives in the instruction file, be aware that some agents reload the instruction file once at task start and do not re-read it mid-run. Changes to the fallback list take effect only on the next agent run, not during an in-progress task. The OpenAI Codex AGENTS.md reference describes the scope and precedence of instruction files — see the sources section for the current URL.

For fallback lists that change frequently (during a gateway migration or incident, for example), prefer an environment variable or a dedicated config file that the agent reads per-request rather than once at startup.

CI workflow integration

In GitHub Actions workflows, the fallback logic typically lives in the script or action that invokes the agent, not in the workflow YAML itself. The workflow YAML controls when the job runs and what secrets are injected; the agent script controls which endpoints are tried in what order. Keep these responsibilities separate.

A minimal pattern:

job: run-coding-agent steps: - name: Set endpoint priority list env: MODEL_PRIORITY_LIST: “[{"base_url": "…", "model": "…", "priority": 1}, …]” # Verify the exact env var name and format your agent script expects

- name: Run agent with fallback routing
  run: python agent_runner.py
  # agent_runner.py reads MODEL_PRIORITY_LIST and implements the retry loop

Verify the exact syntax for injecting multi-value environment variables in the GitHub Actions workflow syntax reference — the exact quoting and escaping rules matter and are documented there.

Smoke-test workflow

Setup assumptions:

You have a working primary endpoint configuration verified against the CometAPI chat completions docs.
You have a fallback list with at least two entries.
You have logging in place that writes a structured record after each request attempt.

Happy-path check:

Send a minimal valid chat completions request to your primary endpoint. Verify:

HTTP 200 response received.
Response body contains the expected top-level fields (verify field names in the chat completions reference before hardcoding them in assertions).
The log record shows endpoint_used: primary, fallback_triggered: false.

Error-path check:

Temporarily configure the primary endpoint with an invalid base URL or a deliberately wrong model identifier that returns a 404. Send the same minimal request. Verify:

The agent does not crash or return an empty result silently.
The log record shows the primary endpoint attempted, the error code received, and the fallback endpoint selected.
The fallback endpoint returns HTTP 200.
The log record shows fallback_triggered: true, fallback_endpoint_used: [second entry in priority list].

Minimum assertions:

Log record is written for every request attempt, including failures.
Fallback trigger codes match the configured trigger list (not a superset of it).
The run does not fall back on a 400 or 401.

Pass/fail log fields to record:

{ “run_id”: “”, “smoke_test_timestamp”: “”, “primary_endpoint_status”: “”, “fallback_triggered”: true, “fallback_endpoint_priority”: 2, “fallback_endpoint_status”: “”, “overall_result”: “pass”, “notes”: “” }

What the smoke test must not assert:

Do not assert specific model names, pricing, rate limits, or quota values — these change and your smoke test should not break when they do.
Do not assert exact response content — you are testing the routing layer, not the model output.
Do not assert uptime or latency targets for any endpoint.

Log record template

After each agent run that exercised fallback routing, record:

{ “run_id”: “PLACEHOLDER_RUN_ID”, “agent_tool”: “PLACEHOLDER_AGENT_TOOL”, “started_at”: “PLACEHOLDER_ISO8601”, “completed_at”: “PLACEHOLDER_ISO8601”, “endpoints_attempted”: [ { “priority”: 1, “result”: “PLACEHOLDER_HTTP_STATUS_OR_TIMEOUT”, “fallback_triggered”: false }, { “priority”: 2, “result”: “PLACEHOLDER_HTTP_STATUS”, “fallback_triggered”: true } ], “final_endpoint_priority”: 2, “run_outcome”: “PLACEHOLDER_PASS_OR_FAIL” }

Do not log credentials, full request bodies, full response bodies, or model pricing information in this record.

Failure modes

Evidence gap: the agent cannot inspect the failing log, source page, pull request, or local command output. The safe action is to stop and record the missing evidence instead of guessing.
Scope drift: the agent edits files that are not connected to the observed failure. Keep the repair tied to the failing signal and leave unrelated cleanup for a separate task.
Environment mismatch: the local check uses different versions, credentials, feature flags, or runtime settings than the hosted path. Record the mismatch before treating the result as proof.
Unreviewed fallback: the agent changes models, endpoints, permissions, or retry behavior to make a run pass without preserving the review boundary. Treat access and provider failures as operational blockers, not topic failures.
Weak handoff: the final note says the issue is fixed but omits the command, result, changed files, and remaining uncertainty. That makes the next operator repeat the investigation.

Sources checked

OpenAI Codex AGENTS.md guidance - accessed 2026-05-29; purpose: verify repository instruction-file context for coding agents.
GitHub Actions workflow syntax documentation - accessed 2026-05-29; purpose: verify workflow permission configuration areas.
CometAPI documentation - accessed 2026-05-29; purpose: verify current CometAPI documentation navigation.
CometAPI chat completions reference - accessed 2026-05-29; purpose: verify chat completion contract areas.
CometAPI responses reference - accessed 2026-05-29; purpose: verify responses endpoint contract areas.
CometAPI models overview - accessed 2026-05-29; purpose: verify model catalog discovery guidance.
CometAPI help center - accessed 2026-05-29; purpose: verify support and escalation documentation areas.

Contract details to verify

Area	What to verify	Source URL	Accessed	Safe candidate wording
Chat completions endpoint path	Confirm the exact URL path under the base URL	https://apidoc.cometapi.com/api/text/chat	2026-05-29	“Verify the current endpoint path in the CometAPI chat completions reference”
Responses API endpoint path	Confirm whether Responses API uses a different path than chat completions	https://apidoc.cometapi.com/api/text/responses	2026-05-29	“Verify the Responses API path and whether it shares the same base URL as chat completions”
429 response body structure	Confirm the field name that distinguishes quota exhaustion from rate limiting	https://apidoc.cometapi.com/api/text/chat	2026-05-29	“Check whether the gateway returns a structured quota error body or a plain rate-limit message”
Retry-After header support	Confirm whether 429 responses include a Retry-After header	https://apidoc.cometapi.com/api/text/chat	2026-05-29	“Verify whether a Retry-After header is present on quota responses before building backoff logic that depends on it”
Auth header format	Confirm the exact Authorization header scheme (Bearer vs other)	https://apidoc.cometapi.com/api/text/chat	2026-05-29	“Verify the current auth header format in the chat completions reference”
Fallback-eligible HTTP codes	Confirm which 5xx codes indicate infrastructure failure vs model error	https://apidoc.cometapi.com/support/help-center	2026-05-29	“Consult the help center for gateway error classification before treating all 500s as fallback-eligible”

FAQ

Q: Should my fallback list include the same model on multiple gateways, or different models on the same gateway?

Both patterns are valid but have different trade-offs. Same model on multiple gateways gives you endpoint redundancy but requires managing credentials for each gateway. Different models on the same gateway simplifies credential management but means a fallback may change model behavior in ways that affect output quality. Most teams start with the same gateway, different models (a capable primary and a lighter-weight fallback), and add a second gateway only when single-gateway reliability is demonstrably insufficient.

Q: How do I know if my agent is actually using the fallback endpoint rather than just failing silently?

Only structured logging answers this reliably. If your agent does not write a per-request log record that includes which endpoint was used, you cannot audit fallback behavior after the fact. Add a fallback_triggered boolean and an endpoint_used field to every request log before you rely on fallback routing in production.

Q: What is the difference between fallback routing and load balancing?

Load balancing distributes requests across multiple endpoints under normal operating conditions. Fallback routing is a failure-response mechanism: the primary endpoint is used by default and alternatives are only tried when the primary fails. Many coding agent setups need fallback routing but do not need load balancing, especially if request volume is low.

Q: Can I set a fallback for a specific error code only, such as 429, and let other errors propagate normally?

Yes. This is often the safest starting point. A 429-only fallback means quota exhaustion is handled automatically while other errors (including malformed requests) surface immediately. You can expand the trigger list after observing real failure patterns in your logs.

Q: Where should the fallback list live if multiple agents share the same configuration?

A shared configuration file or environment variable that all agents read at startup is the most maintainable approach. Avoid duplicating the list in each agent’s instruction file — when a priority change is needed during an incident, you want to update one place. See Route Coding Agent Model Calls Without Endpoint Drift for guidance on centralizing gateway configuration.

Q: How do I get started with CometAPI as my gateway?

Visit CometAPI and review the chat completions and model overview docs linked in the sources section above to verify current endpoint paths, model identifiers, and auth requirements before wiring your fallback list.

Reader next step

Turn the next coding-agent request into a one-page task brief, then compare it with AI Coding Agent Setup, Security, and Model Routing . For the surrounding setup and permission baseline, review Triage CI Failures With a Coding Agent Without Losing the Evidence before assigning broader repository work.

After the repository instruction, secret, and review gates are in place, evaluate CometAPI as the model gateway target for only the writer, reviewer, critic, or fallback roles the team actually needs.