When to Stop, Retry, or Escalate: A Practical Guide to Coding Agent Task Control

Last reviewed: 2026-06-09

Direct answer

A coding agent that has no defined stop condition will keep running until it hits a resource limit, corrupts a branch, or surprises the next human reviewer. Clear task control means three things:

Stop conditions — the exact state that means the task is done and the agent should halt, even if additional steps look possible.
Retry boundaries — how many times the agent may attempt a failing step, and what it must log before each retry.
Escalation paths — the conditions under which the agent must surface a decision to a human rather than deciding on its own.

These three elements belong in your repository instruction file (an AGENTS.md, CLAUDE.md, or equivalent) before you run the agent, not as inline prompts after something goes wrong.

Tools such as OpenAI Codex (cloud task mode) and Claude Code enforce instruction-file rules at task start, so rules written there apply consistently across every session. GitHub pull request gates and CI checks provide external stop points that any well-structured agent workflow should respect before merging changes.

For API-backed agent workflows, the model gateway your agent calls — including its stop-sequence behavior, context limits, and any tool-use constraints — should be confirmed against current documentation before you build retry or stop logic around it. See Route Coding Agent Model Calls Without Endpoint Drift for gateway stability considerations that are relevant to any retry loop.

For broader release checks, see AI Coding Agent Setup, Security, and Model Routing .

Who this is for

This guide is for engineers and technical leads who:

Run coding agents autonomously on feature work, test repair, or CI triage.
Want agents to stop at a clear boundary rather than guess when they are done.
Need a lightweight escalation path so agents surface ambiguous decisions before acting on them.
Use any tool-calling or instruction-file-aware agent — Codex cloud tasks, Claude Code, Cursor Agent, or a custom loop built on an OpenAI-compatible gateway.

If you are new to instruction files, start with How to Write Repository Instructions for Coding Agents before adding stop-condition rules.

Key takeaways

Write stop conditions as positive assertions about output state, not as “stop after N steps.”
A retry boundary requires a log entry before each retry; silent retries accumulate invisible debt.
Escalation is not failure — it is the correct path when the agent lacks the authority or evidence to proceed safely.
Instruction files (AGENTS.md, CLAUDE.md, .cursor/rules, or equivalent) are the right place for stop and escalation rules because they load before the agent begins work.
GitHub pull request review gates and CI checks are external enforcement points; write your stop conditions so they align with, not around, those gates.
Verify exact agent stop-sequence and max-turn behavior in current tool documentation before relying on any default behavior.

Stop conditions: what they are and how to write them

A stop condition is a testable assertion about the world that, when true, means the agent’s task is complete. It is different from a step count or a time budget.

Weak stop condition (avoid):

“Complete the task when you think it is done.”

Stronger stop condition:

“Stop when: (a) all tests in the affected package pass locally, (b) a commit exists on the feature branch with a message summarizing the change, and (c) no staged file outside the allowed path set.”

Good stop conditions share three properties:

Externally observable — a human or a CI run can verify them without reading the agent’s reasoning.
Unambiguous — “tests pass” is clearer than “the code looks correct.”
Scoped — they reference the specific files, branches, or checks the task owns, not the whole repository.

For Codex cloud tasks, stop conditions written in an AGENTS.md file at the repository root apply to all tasks run against that repository, as described in the Codex AGENTS.md documentation. Claude Code reads project rules from CLAUDE.md and memory files according to its memory-file guidance. Both models mean that stop conditions written once in the instruction file apply consistently without repeating them in every prompt.

Verify the exact instruction-file path and loading behavior for your specific tool version before deploying stop conditions in production workflows.

Retry boundaries: logging before acting

Retry loops without logging are silent — by the time you see a failure, the agent may have retried a dozen times and the original error context is gone.

A minimal retry boundary has four elements:

Maximum attempt count — state it as an explicit number (for example, “no more than two retries per CI step”).
Per-retry log entry — before each retry, the agent writes the failure reason, the retry count, and the step being retried to a persistent log or commit message.
Non-retriable conditions — certain failures should never be retried without human review: authentication errors, destructive file-system operations outside the allowed path, and any error that changes state outside the task scope.
Exit behavior — when the retry limit is reached, the agent halts and surfaces the final log entry rather than continuing with a different strategy.

Example instruction-file rule:

Retry policy

Maximum 2 retries per failing CI step.
Before each retry, append a line to AGENT_LOG.md: [RETRY N] failed with: .
Do not retry: auth failures, changes outside /src and /tests, any step that deletes files.
On retry limit reached: halt, commit AGENT_LOG.md, open a draft PR.

The exact syntax and supported fields depend on your agent tool and its current instruction-file format. Confirm with current tool documentation before deploying.

Escalation paths: when the agent must ask

Escalation is not a fallback for failure. It is the correct action when the agent cannot determine the safe path forward within its defined authority.

Common escalation triggers:

Ambiguous acceptance criteria — the task brief does not say whether behavior A or behavior B is correct.
Conflicting signals — CI passes locally but fails in the remote environment, and the agent cannot determine why.
Scope creep — fixing the stated problem requires changes outside the allowed path set.
Destructive operations — the only path forward involves deleting or rewriting files that were not listed as in-scope.
Secrets or credentials — the task would require reading or writing a secret the agent does not have explicit permission to access.

GitHub pull request mechanics provide a natural escalation surface: an agent that opens a draft PR and requests review is escalating in a way that is visible, tracked, and non-destructive. The GitHub pull request documentation describes review request and draft PR mechanics that support this pattern. Keep How to Hand Off Coding Agent Pull Requests for Review on hand for the PR handoff checklist.

For agents that run inside a CI environment, GitHub Actions workflow controls — such as required reviewers on environment gates — can enforce escalation at the infrastructure level rather than relying entirely on the agent’s own judgment.

Smoke-test workflow

Before deploying stop conditions and escalation rules in production, verify the baseline agent behavior in a controlled branch.

Setup assumptions:

You have a feature branch with at least one failing test.
Your instruction file is committed to the branch root.
The agent is configured to use a known model gateway endpoint (exact model ID and endpoint path verified against current documentation).

Happy-path plan:

Trigger the agent with a task that has a clear stop condition (for example: “fix the failing test in /tests/unit/; stop when all tests in that directory pass”).
Observe that the agent halts after the stop condition is met without continuing to unrelated files.
Check that the agent’s commit message references the stop condition outcome.

Error-path check:

Introduce a second failing test that requires a change outside the allowed path.
Verify the agent logs an escalation entry rather than making the out-of-scope change.
Confirm the agent opens a draft PR or equivalent handoff artifact rather than merging.

Minimum assertions:

Agent stops before touching files outside the stated scope.
At least one log entry exists for each retry attempted.
No merge or push to a protected branch occurred without a passing CI gate.

Pass/fail logging fields (record these after the smoke test):

What the smoke test must not assert: exact model response text, specific token counts, gateway latency, or pricing. Those claims require verification against current API documentation and are outside the scope of a behavioral stop-condition check.

Failure modes

Evidence gap: the agent cannot inspect the failing log, source page, pull request, or local command output. The safe action is to stop and record the missing evidence instead of guessing.
Scope drift: the agent edits files that are not connected to the observed failure. Keep the repair tied to the failing signal and leave unrelated cleanup for a separate task.
Environment mismatch: the local check uses different versions, credentials, feature flags, or runtime settings than the hosted path. Record the mismatch before treating the result as proof.
Unreviewed fallback: the agent changes models, endpoints, permissions, or retry behavior to make a run pass without preserving the review boundary. Treat access and provider failures as operational blockers, not topic failures.
Weak handoff: the final note says the issue is fixed but omits the command, result, changed files, and remaining uncertainty. That makes the next operator repeat the investigation.

Sources checked

OpenAI Codex AGENTS.md guidance - accessed 2026-06-09; purpose: verify repository instruction-file context for coding agents.
OpenAI Codex cloud documentation - accessed 2026-06-09; purpose: verify hosted coding-agent workflow context.
Claude Code memory documentation - accessed 2026-06-09; purpose: verify project memory and instruction-file context for agent workflows.
GitHub pull requests documentation - accessed 2026-06-09; purpose: verify pull request review and collaboration boundaries.
CometAPI documentation root - accessed 2026-06-09; purpose: gateway and model-routing context for API-backed agent retry loops.
CometAPI chat completions reference - accessed 2026-06-09; purpose: verify chat completions endpoint contract areas relevant to agent API calls.

Contract details to verify

Area	What to verify	Source URL	Accessed	Safe candidate wording
Instruction-file loading path	Exact file name and directory the agent tool reads for stop/escalation rules	https://github.com/openai/codex/blob/main/docs/agents_md.md	2026-06-09	“Verify the exact path your agent tool reads before deploying.”
Cloud task stop-sequence behavior	Whether cloud tasks honor max-turn or explicit stop tokens from the instruction file	https://developers.openai.com/codex/cloud	2026-06-09	“Confirm stop-sequence and max-turn support in current cloud task documentation.”
Claude Code memory-file precedence	Which memory file takes precedence when multiple files exist; whether project-level overrides global	https://code.claude.com/docs/en/memory	2026-06-09	“Check current memory-file precedence rules before relying on project-level overrides.”
Draft PR creation via agent	Whether an agent can programmatically open a draft PR without elevated token permissions	https://docs.github.com/en/pull-requests	2026-06-09	“Verify PR creation token scope requirements in current GitHub documentation.”
CI environment gate enforcement	Required reviewer and environment protection rules that block agent merges	https://docs.github.com/en/actions	2026-06-09	“Confirm environment protection rule configuration in current GitHub Actions documentation.”
Gateway endpoint behavior	Stop-token and max-context behavior for the model endpoint the agent calls	https://apidoc.cometapi.com/api/text/chat	2026-06-09	“Verify stop-sequence and context-limit behavior in current CometAPI chat completions documentation.”

Reader next step

Compare the workflow against Start with CometAPI .

Use AI Coding Agent Setup, Security, and Model Routing as the next comparison point. Keep Triage CI Failures With a Coding Agent Without Losing the Evidence nearby for setup and permission checks.

FAQ

Q: What is the difference between a stop condition and a task completion check? A: A stop condition is defined before the task starts and is part of the agent’s operating rules. A completion check can happen after the fact. Stop conditions written in an instruction file apply every time the agent runs; post-hoc checks are easier to skip.

Q: Should I put stop conditions in the system prompt or the instruction file? A: Instruction files (AGENTS.md, CLAUDE.md, or equivalent) are the more durable location because they load from the repository rather than the session prompt. Session prompts can be truncated, overridden, or forgotten across restarts. Verify what your specific tool version loads and when.

Q: How many retries is the right default? A: Two retries is a common starting point for CI-step failures because it handles transient flakiness without masking structural problems. Adjust based on your environment’s flakiness rate, and always require a log entry before each retry so the evidence is preserved.

Q: What should an escalation handoff include? A: At minimum: the original task description, the stop condition that could not be met, the log entries from any retries, the specific ambiguity or out-of-scope condition that triggered escalation, and the current state of the branch. A draft PR with these details in the description is a clean escalation artifact.

Q: Can I rely on the agent itself to judge when it has reached its authority limit? A: Not entirely. Well-defined stop conditions and external enforcement (CI gates, protected branch rules) reduce your reliance on the agent’s own judgment. The agent’s judgment is an additional layer, not the primary control.

Q: Where does a model gateway fit into stop-condition design? A: The gateway is the infrastructure boundary for API calls. Its context limits and stop-sequence behavior can interact with your retry logic — a context-exhausted request looks like a failure and may trigger a retry. Verify gateway behavior separately from agent instruction-file behavior. Start with CometAPI if you need an OpenAI-compatible gateway with documented endpoint behavior.