CI Repair Loops for Coding Agents: A Practical Workflow Guide

Last reviewed: 2026-06-02

Direct answer

A coding agent CI repair loop is a structured cycle in which a failing CI run triggers an evidence-collection step, that evidence is handed to a coding agent as a scoped task brief, the agent proposes a fix on a branch, and a second CI run validates the fix before the PR is allowed to merge. The loop has a hard retry limit and a human escalation gate so it cannot spin indefinitely.

The four stages are:

Detect — the CI system emits a failure signal with enough context to reproduce the failure (log lines, test names, exit codes).
Collect — a script or workflow step assembles the failure evidence into a structured brief the agent can read without access to live CI state.
Repair — the agent reads the brief, proposes a targeted change on a repair branch, and opens or updates a PR.
Validate — a second CI run executes on the repair branch; if it passes, the PR is cleared for review; if it fails again, the loop increments a counter and either retries up to the limit or escalates to a human.

Three things break CI repair loops most often: evidence that is too sparse for the agent to reproduce the failure locally, a retry limit that is absent or too high, and a validation gate that checks a different subset of tests than the original failure.

For broader release checks, see AI Coding Agent Setup, Security, and Model Routing .

Who this is for

This guide is for engineers and platform teams who are already running coding agents on pull request workflows and want to automate the first-response layer of CI failure triage. You should have a working GitHub Actions setup, be comfortable writing YAML workflow files, and have a coding agent that accepts task briefs as input. If you are earlier in the setup journey, the companion guide Triage CI Failures With a Coding Agent Without Losing the Evidence covers the triage and evidence-collection phase in more detail.

Key takeaways

Structure the repair loop as four discrete stages: detect, collect, repair, validate. Keep each stage independently observable.
Collect failure evidence into a file the agent can read without live CI access. Include the exact failing test names, the last N lines of relevant log output, the failing step name, and the triggering commit SHA.
Set a hard retry limit (two to three attempts is a reasonable starting point). Log the attempt count in a structured field every cycle.
The validation CI run must exercise the same test targets that failed originally. A broader re-run can mask regressions; a narrower re-run can miss them.
Human escalation is not optional. When the loop exhausts its retries, it must open a human-reviewable issue or comment and halt.
Never allow the repair loop to push directly to a protected branch. All fixes must go through a PR with a passing CI gate.

Workflow overview

The sections below describe each stage with enough detail to build a working loop. Exact GitHub Actions syntax, trigger event names, workflow field names, and reusable workflow paths should be verified against the current GitHub Actions documentation before implementation, as these change between major GitHub releases.

Stage 1: Detect

Configure your CI workflow to emit a structured failure artifact when a run fails. At a minimum the artifact should contain:

the workflow run ID
the failing job name and step name
the failing test identifiers (not just the suite name)
the last 200 to 500 lines of the failing step’s log output
the triggering branch name and commit SHA
the exit code

GitHub Actions exposes the workflow run ID as a default environment variable. The if: failure() condition on a step lets you trigger evidence collection only on failure. Verify the current set of default environment variables and their exact names in the GitHub Actions workflow syntax reference.

An artifact upload step following a failing test step can preserve this evidence for the collect stage without requiring the repair job to query the GitHub API at runtime.

Stage 2: Collect

A separate job or workflow downloads the failure artifact, parses it, and writes a structured task brief file. The brief is the only input the agent receives. It should be self-contained: the agent must not need live access to the CI run, the failing test environment, or secrets outside its own execution context.

A minimal task brief for a repair agent might include:

{
  "task": "ci-repair",
  "attempt": 1,
  "max_attempts": 3,
  "run_id": "<run-id-placeholder>",
  "commit_sha": "<sha-placeholder>",
  "branch": "<branch-placeholder>",
  "failing_tests": ["<test-id-placeholder>"],
  "failing_step": "<step-name-placeholder>",
  "log_tail": "<last-200-lines-placeholder>",
  "repair_branch": "ci-repair/<run-id-placeholder>"
}

All values shown above are structural placeholders. The actual values are populated at runtime by your collection script from the CI failure artifact.

Stage 3: Repair

The coding agent reads the brief, identifies the likely cause from the log tail and failing test names, and proposes a change on the repair branch named in the brief. The agent must:

work only within the scope implied by the failing tests (no opportunistic refactors)
commit with a message that references the run ID and attempt number
open a draft PR against the original branch, or update an existing repair PR if one is open

Instruction files such as AGENTS.md or a repository-level coding agent instructions file can encode these scope constraints so the agent does not need them repeated in every brief. See How to Write Repository Instructions for Coding Agents for guidance on what to put in those files.

Verify how your specific agent tool reads instruction files and task briefs. The OpenAI Codex AGENTS.md documentation describes how that tool resolves instruction files; other agents may use different mechanisms. Exact file names, lookup paths, and instruction precedence rules should be confirmed in your agent’s current documentation before relying on them.

Stage 4: Validate

When the repair PR is opened or updated, a CI workflow runs against the repair branch. This validation run must target the same test identifiers that appeared in the failing_tests field of the brief. If the validation run passes, the PR moves to the normal human review queue. If it fails:

the attempt counter increments
the failure artifact from the validation run becomes the input for the next collect stage
if the counter has reached max_attempts, the loop halts and writes a human escalation comment to the PR

The escalation comment should include the run IDs of every attempt, the repair branch name, and a plain-language summary of what was tried.

Smoke-test workflow

Before running a live repair loop, verify the individual stages with a controlled test.

Setup assumptions:

You have a GitHub Actions workflow that can trigger on workflow_dispatch.
You have a coding agent that can be invoked with a JSON task brief as input.
You have a test suite with at least one intentionally broken test you can use as a canary.

Happy-path plan:

Break the canary test intentionally in a feature branch.
Push the branch and let the detection CI run fail.
Confirm the failure artifact is uploaded and contains the expected fields (run ID, test IDs, log tail).
Trigger the collect job manually and confirm it produces a valid brief JSON file.
Invoke the agent with the brief. Confirm it opens a repair PR on the correct branch with a commit message that includes the run ID.
Confirm the validation CI run triggers on the repair PR and passes after the canary test is fixed.

Error-path check: Set max_attempts to 1 and intentionally give the agent a brief with a log tail that does not contain enough information to identify the root cause. Confirm:

the agent opens a PR (even if the fix is wrong)
the validation run fails
the loop increments the counter to 1
because max_attempts is 1, the loop writes an escalation comment and halts without opening another repair cycle

Minimum assertions:

The failure artifact exists after a failed run.
The brief JSON file is valid (parseable, required fields present).
The repair PR is opened against the correct base branch.
The validation run targets the correct test identifiers.
The escalation comment is written when the limit is reached.

What the smoke test must not assert:

The agent will always produce a correct fix. The loop is designed for the common case; novel failures require human review.
Specific timing or latency of CI jobs.
Token counts, costs, or model availability.

Log record template (sanitized placeholder values only):

{
  "event": "ci_repair_loop",
  "site": "<repo-slug>",
  "run_id": "<run-id-placeholder>",
  "attempt": 1,
  "max_attempts": 3,
  "failing_tests_count": 2,
  "repair_branch": "ci-repair/<run-id-placeholder>",
  "pr_number": "<pr-number-placeholder>",
  "validation_outcome": "pass|fail",
  "escalated": false,
  "recorded_at": "<iso8601-placeholder>"
}

Record one entry per attempt. Do not log actual test output, credentials, token counts, or raw log tails in this record.

Once your repair loop is stable, consider pairing it with:

Triage CI Failures With a Coding Agent Without Losing the Evidence — deeper coverage of the evidence-collection stage.
How to Hand Off Coding Agent Pull Requests for Review — structuring the human review gate that follows a successful repair cycle.
Write Coding Agent Task Briefs That Produce Reviewable Changes — designing task briefs the agent and human reviewer can both parse.

If your repair loop involves agents calling external model APIs, see Route Coding Agent Model Calls Without Endpoint Drift for consistent endpoint configuration across CI environments.

Failure modes

Evidence gap: the agent cannot inspect the failing log, source page, pull request, or local command output. The safe action is to stop and record the missing evidence instead of guessing.
Scope drift: the agent edits files that are not connected to the observed failure. Keep the repair tied to the failing signal and leave unrelated cleanup for a separate task.
Environment mismatch: the local check uses different versions, credentials, feature flags, or runtime settings than the hosted path. Record the mismatch before treating the result as proof.
Unreviewed fallback: the agent changes models, endpoints, permissions, or retry behavior to make a run pass without preserving the review boundary. Treat access and provider failures as operational blockers, not topic failures.
Weak handoff: the final note says the issue is fixed but omits the command, result, changed files, and remaining uncertainty. That makes the next operator repeat the investigation.

Sources checked

GitHub Actions documentation - accessed 2026-06-02; purpose: verify workflow runs, jobs, steps, checks, and logs.
GitHub pull requests documentation - accessed 2026-06-02; purpose: verify pull request review and collaboration boundaries.
OpenAI Codex AGENTS.md guidance - accessed 2026-06-02; purpose: verify repository instruction-file context for coding agents.
GitHub Actions workflow syntax documentation - accessed 2026-06-02; purpose: verify workflow permission configuration areas.

Contract details to verify

Area	What to verify	Source URL	Accessed	Safe candidate wording
Failure artifact upload	Confirm the actions/upload-artifact step name, inputs, and retention defaults in the current Actions release	https://docs.github.com/en/actions	2026-06-02	“An artifact upload step can preserve failure evidence; verify the current upload action name and inputs before use.”
Default environment variables	Confirm the exact variable names for run ID, commit SHA, branch, and job name available inside a workflow step	https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-syntax	2026-06-02	“GitHub Actions exposes run and context values as environment variables; verify current names in the workflow syntax reference.”
if: failure() condition	Confirm the condition expression for step-level failure-only execution	https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-syntax	2026-06-02	“A failure condition on a step limits evidence collection to failing runs; verify the exact expression syntax in the current docs.”
Draft PR creation	Confirm how draft PRs are opened and updated via the GitHub API or CLI, and what triggers the ready-for-review state	https://docs.github.com/en/pull-requests	2026-06-02	“Draft PRs can hold repair changes until the validation run passes; verify current PR creation options before automating.”
AGENTS.md resolution order	Confirm which directories Codex searches for AGENTS.md and the precedence rules between repo-root and subdirectory files	https://github.com/openai/codex/blob/main/docs/agents_md.md	2026-06-02	“Instruction files scope agent behavior; verify your agent’s current lookup path and precedence rules in its documentation.”
Branch protection rules	Confirm the branch protection fields that block direct push to protected branches and require passing status checks before merge	https://docs.github.com/en/pull-requests	2026-06-02	“Branch protection can enforce the validation gate; verify the current settings fields for your repository plan.”

FAQ

How many retry attempts should a CI repair loop allow? Two to three attempts is a practical starting point. A single attempt gives little room for transient environment noise; more than three usually means the failure is complex enough to warrant human review anyway. Set the limit explicitly and log the attempt count so you can tune it based on your observed repair success rate.

What should the escalation comment include? At minimum: the repair branch name, the PR number, the run IDs of each attempt, and a brief description of what the agent tried. The goal is to give the human reviewer enough context to pick up without re-reading the entire CI log.

Can the repair loop work without a draft PR? Yes, but draft PRs provide a clear signal that the branch is under active repair and should not be merged prematurely. If your GitHub plan or workflow does not support draft PRs, an equivalent label or status check can serve the same gate function.

How do I prevent the loop from repairing the wrong tests? The task brief should list the exact test identifiers from the original failure, and the validation run should filter to the same identifiers. A broader re-run risks masking regressions introduced by the repair; a narrower one may not catch side effects. Review the test filter flags for your test runner to find the right scope.

What if the agent cannot reproduce the failure from the log tail alone? That is the most common cause of loop exhaustion. Extend the log tail in the brief, add structured test output (JUnit XML or equivalent), or include the environment metadata (OS, language runtime version) the agent needs. If enriched evidence still does not produce a passing repair, escalate to a human.

Does this approach work with agents other than OpenAI Codex? The four-stage structure is agent-agnostic. The task brief format and the instruction file mechanism will differ by tool, but the detect-collect-repair-validate cycle and the retry-and-escalate gate apply to any coding agent that accepts structured input and opens PRs. Verify your specific agent’s brief format and instruction file behavior in its own documentation.

Reader next step

Turn the next coding-agent request into a one-page task brief, then compare it with AI Coding Agent Setup, Security, and Model Routing. For the surrounding setup and permission baseline, review Triage CI Failures With a Coding Agent Without Losing the Evidence before assigning broader repository work.

Use AI Coding Agent Setup, Security, and Model Routing as the next comparison point. Keep Triage CI Failures With a Coding Agent Without Losing the Evidence nearby for setup and permission checks.