Last reviewed: 2026-05-30

Direct answer

Model routing for coding agent workflows means deciding which model or model tier handles each category of task your agent performs — and encoding that decision in a stable, version-controlled place so the agent does not drift silently to a different model when a gateway default changes.

The core decision sequence is:

  1. Classify the task type (code generation, code review, test repair, CI triage, documentation, quick lookup).
  2. Assign a model tier or a named model to each class, consulting your gateway’s current model list rather than hardcoding a vendor alias.
  3. Record the assignment in your repository instruction file (AGENTS.md, .claude/memory, or an equivalent) so every agent run picks it up without extra configuration.
  4. Add a CI step that verifies the gateway endpoint responds before the agent task starts, so failures surface early and do not silently fall back to a default.

The sections below unpack each step with concrete setup patterns and a smoke-test workflow you can adapt to your stack.

Who this is for

This guide is for engineers and platform teams who:

  • Run one or more coding agents (cloud-hosted or local CLI) on real repositories.
  • Need to control which model each agent task uses without redeploying the agent or touching vendor dashboards.
  • Want routing decisions recorded in source control so they survive team changes, repo forks, and gateway migrations.
  • Have or plan to have a CI pipeline that guards agent task execution.

If you are still choosing which agent tool to adopt, see AI Coding Agent Setup, Security, and Model Routing for a broader orientation before returning here.

Key takeaways

  • Encode model routing decisions in your repository instruction file, not in agent session state or environment defaults.
  • Map task types to model tiers explicitly; do not rely on gateway defaults to stay stable.
  • A gateway that exposes an OpenAI-compatible chat completions endpoint lets you swap the underlying model family without changing agent code.
  • CI verification of the gateway endpoint before the agent step runs is cheap insurance against silent routing failures.
  • Exact model identifiers, pricing, and rate limits change; always verify current values in the linked source documentation before committing routing assignments to production.

Understanding model routing in agent context

A coding agent makes model calls for several distinct task types in a single workflow run. A cloud Codex task might call the model for planning, then again for each code-generation step, then once more for a commit message. A local Claude Code or OpenCode session calls the model continuously as it reads context and takes actions.

Without explicit routing decisions, the agent uses whatever model your API key’s default resolves to at call time. That default can change when a gateway updates its routing table, when a model is deprecated, or when a new default is promoted. The result is silent behavioral drift that shows up as unexplained test failures or subtly wrong code, not as an error.

Explicit routing gives you three things:

  • Reproducibility: the same model handles the same task class across runs.
  • Cost visibility: you know which tier each task class uses, so gateway cost anomalies are easier to diagnose. Verify current pricing against CometAPI pricing documentation before making cost assumptions.
  • Migration safety: when a model is deprecated or a better option emerges, you change one line in the instruction file rather than hunting through agent configuration spread across multiple files.

Where to record routing decisions

Repository instruction files

Each major coding agent tool reads a designated instruction file from the repository root or a config directory:

  • OpenAI Codex reads AGENTS.md at the repository root and in parent directories up to the task working directory. The OpenAI Codex AGENTS.md reference describes the lookup order and supported directives.
  • Claude Code reads memory files from .claude/ and supports project-level and user-level instruction layers. The Claude Code memory documentation describes the file hierarchy and how instructions are merged.
  • GitHub Copilot reads repository-level custom instructions from a path configured in repository settings. The GitHub Copilot custom instructions guide describes the current setup path; verify the exact file path and format there before committing instructions.

Record your routing assignment as a human-readable directive in whichever file your agent tool reads. A minimal example entry might look like:

Model routing

Use the [gateway-name] chat completions endpoint for all tasks. For code generation and test repair tasks, request model tier: [tier-name — verify current identifier at gateway docs]. For quick lookups and commit messages, request model tier: [lighter tier — verify current identifier at gateway docs]. Do not use the vendor default; always pass an explicit model parameter.

Replace the bracketed placeholders with identifiers verified from your gateway’s current model list. See the CometAPI model overview for current model identifiers if you use CometAPI as your gateway; do not copy identifiers from guides or blog posts without confirming they are still current.

Gateway configuration

If your team runs a shared model gateway (an OpenAI-compatible proxy such as CometAPI), the gateway itself may support per-route or per-key model assignment. This lets you enforce routing at the infrastructure layer independently of what each agent’s instruction file requests.

A gateway-layer routing assignment means:

  • Individual agents do not need to know the upstream model family.
  • You can swap the upstream model without touching any instruction file.
  • The gateway’s own logs become the authoritative record of which model actually handled each call.

See Route Coding Agent Model Calls Without Endpoint Drift for a detailed walkthrough of gateway-layer routing setup.

Mapping task types to model tiers

Not all coding agent tasks require the same model capability. A general-purpose tier mapping might look like this:

Task typeSuggested tierNotes
Full-file code generationHigh-capability tierRequires strong reasoning and context retention
Test repair from CI outputHigh-capability tierNeeds to interpret stack traces and adjust code
CI failure triage summaryMid-capability tierSummarization; less generation depth needed
Commit message generationLightweight tierShort, structured output
Quick inline lookupLightweight tierLow latency acceptable; no deep reasoning
Documentation draftMid-capability tierQuality matters; cost moderate

Verify that the model identifiers your gateway exposes correspond to the capability tiers you expect. Gateway documentation may use different naming conventions than vendor marketing names. Check CometAPI model overview for current gateway-side identifiers; verify exact API request field names and response fields at CometAPI chat completions reference before wiring up agent configuration.

CI integration for routing verification

GitHub Actions (and equivalent CI systems) can verify that your gateway endpoint is reachable before the agent task step runs. This prevents a gateway outage or misconfigured endpoint from producing misleading agent failures.

A minimal pre-agent CI step pattern:

- name: Verify gateway endpoint
  run: |
    response=$(curl -s -o /dev/null -w "%{http_code}" \
      -H "Authorization: Bearer $GATEWAY_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{"model": "[verify-current-model-id]", "messages": [{"role": "user", "content": "ping"}], "max_tokens": 1}' \
      https://[your-gateway-host]/api/text/chat)
    if [ "$response" != "200" ]; then
      echo "Gateway health check failed: HTTP $response"; exit 1
    fi

Replace [verify-current-model-id] and [your-gateway-host] with values from your gateway’s current documentation. Do not commit literal API keys; use CI secret variables. See GitHub Actions documentation for secret injection patterns and workflow syntax.

For a complete CI triage pattern that preserves failure evidence, see Triage CI Failures With a Coding Agent Without Losing the Evidence.

Smoke-test workflow

Setup assumptions

  • You have a gateway API key in your local environment as GATEWAY_API_KEY.
  • Your gateway exposes a chat completions endpoint; the exact path and host are from your gateway’s current documentation.
  • You have identified at least one model identifier from the gateway model list.

Happy-path request plan

  1. Send a minimal chat completions request with an explicit model parameter set to the identifier you recorded in your instruction file.
  2. Confirm the response returns HTTP 200 and a non-empty choices array.
  3. Confirm the model field in the response matches (or is a canonical expansion of) the identifier you sent.

Error-path check

  1. Send the same request with a deliberately invalid model identifier (e.g., model: “invalid_test”).
  2. Confirm the gateway returns a 4xx error, not a silent fallback to a default model.
  3. If the gateway silently routes to a default instead of erroring, add an assertion on the response model field in your CI step.

Minimum assertions

  • HTTP status is 200 for the valid request.
  • choices[0].message.content is non-empty.
  • model in the response matches the expected tier (as reported by the gateway, not assumed from the request).
  • A bad model ID returns a non-200 status or a response with an error field.

What the smoke test must not assert

  • Do not assert on specific pricing, token counts, rate limits, or latency values in CI; these change without notice and will cause false failures.
  • Do not assert on the exact text of generated content; assert only on structural response fields.
  • Do not assert on model availability beyond whether the endpoint returns a successful response for your chosen identifier.

Pass/fail logging fields

Record the following after each smoke test run:

date_utc: YYYY-MM-DDTHH:MM:SSZ gateway_host: [gateway-host-placeholder] model_requested: [model-id-placeholder] model_returned: [model-id-from-response-placeholder] http_status: [status-code] choices_count: [integer] error_path_status: [status-code-for-invalid-id] pass: true|false notes: ""

Do not log API keys, full request bodies, full response content, or token usage values in shared CI logs.

Putting it together: a minimal routing setup checklist

  1. Identify all task types your agent performs in a typical run.
  2. Consult your gateway’s current model list and assign a model identifier to each task class.
  3. Write the assignment into your repository instruction file using the format your agent tool recognizes.
  4. Add a CI pre-step that verifies the gateway endpoint before the agent task runs.
  5. Run the smoke-test workflow above against a staging or sandbox key before enabling production routing.
  6. Record the model identifiers and assignment rationale in a short comment in the instruction file so future reviewers know why each routing decision was made.

For guidance on writing instruction files that agents parse reliably, see How to Write Repository Instructions for Coding Agents.

If your team uses a CometAPI gateway for model access, Start with CometAPI to explore the available model routes and endpoint documentation.

Failure modes

  • Evidence gap: the agent cannot inspect the failing log, source page, pull request, or local command output. The safe action is to stop and record the missing evidence instead of guessing.
  • Scope drift: the agent edits files that are not connected to the observed failure. Keep the repair tied to the failing signal and leave unrelated cleanup for a separate task.
  • Environment mismatch: the local check uses different versions, credentials, feature flags, or runtime settings than the hosted path. Record the mismatch before treating the result as proof.
  • Unreviewed fallback: the agent changes models, endpoints, permissions, or retry behavior to make a run pass without preserving the review boundary. Treat access and provider failures as operational blockers, not topic failures.
  • Weak handoff: the final note says the issue is fixed but omits the command, result, changed files, and remaining uncertainty. That makes the next operator repeat the investigation.

Sources checked

Contract details to verify

AreaWhat to verifySource URLAccessedSafe candidate wording
Chat completions endpoint pathExact path, host, and required headershttps://apidoc.cometapi.com/api/text/chat2026-05-30“Verify the current endpoint path and required headers in the CometAPI chat completions reference before configuring your agent.”
Auth schemeHeader name, token format, and key scopinghttps://apidoc.cometapi.com/api/text/chat2026-05-30“Verify the authorization header name and token format in the current CometAPI docs; do not assume it matches OpenAI’s exact scheme.”
AGENTS.md directive supportWhich directives Codex actually reads from AGENTS.mdhttps://github.com/openai/codex/blob/main/docs/agents_md.md2026-05-30“Verify which AGENTS.md directives the current Codex version supports; not all key-value fields may be acted upon.”
Claude Code memory file pathCurrent canonical path(s) Claude Code reads for project instructionshttps://code.claude.com/docs/en/memory2026-05-30“Verify the current .claude/ memory file path and merge order in the Claude Code memory docs before writing routing instructions there.”
GitHub Actions secret injectionCurrent syntax for referencing secrets in workflow stepshttps://docs.github.com/en/actions2026-05-30“Verify the current GitHub Actions secrets syntax in the official Actions docs; field names and context paths change across runner versions.”

FAQ

Q: Do I need to set an explicit model in every agent call, or can I rely on the gateway default?

Relying on a gateway default is risky for production workflows. Defaults can change when a gateway updates its routing table or deprecates a model. Setting an explicit model identifier in your instruction file or API call ensures the same model handles the same task class across runs. If your gateway default changes, you see the behavioral difference in review rather than as a silent regression.

Q: What happens if I use a model identifier that the gateway no longer supports?

Behavior depends on the gateway. Some return a 4xx error immediately, which is preferable because it surfaces the problem before the agent produces output. Others silently route to a fallback model, which can produce subtly different output without any visible error. Run the error-path smoke test described above to confirm which behavior your gateway exhibits before relying on it in CI.

Q: Should routing decisions live in the instruction file, in environment variables, or in the agent’s config file?

Instruction files (AGENTS.md, .claude/memory entries, or equivalent) are the preferred location because they are version-controlled, human-readable, and read directly by the agent without requiring a separate configuration injection step. Environment variables are appropriate for the API key and gateway host but not for model routing logic, because they are harder to audit across environments. Agent config files (when the tool exposes them) are a reasonable fallback if the instruction file format does not support model directives.

Q: Can I use different model tiers for different sub-tasks within a single agent run?

Yes, if your gateway and agent tool both support it. The AGENTS.md format allows task-scoped directives for Codex; Claude Code’s memory layers allow project and user instructions to coexist. In practice, fine-grained per-subtask routing adds configuration complexity. Start with a two-tier assignment (heavy tasks vs. lightweight tasks) and refine only when you have evidence that cost or quality justifies a more granular mapping.

Q: How do I handle routing during a gateway migration?

Record the new gateway’s model identifiers in a separate branch of your instruction file, run the smoke-test workflow against the new gateway in staging, and only merge when the CI pre-step passes cleanly. See Migrate Coding Agents to an OpenAI-Compatible Gateway Without Endpoint Drift for a detailed migration checklist.

Q: Does model routing affect agent memory or context window behavior?

Different models have different context window sizes, and some gateway tiers may truncate context differently. If you route a high-context task (full-file generation, large codebase analysis) to a lighter model tier, you may see silent context truncation rather than an error. Always verify the context window of your chosen model tier against your largest expected task context before committing that routing assignment.

Reader next step

Turn the next coding-agent request into a one-page task brief, then compare it with AI Coding Agent Setup, Security, and Model Routing . For the surrounding setup and permission baseline, review Triage CI Failures With a Coding Agent Without Losing the Evidence before assigning broader repository work.