Triage CI Failures With a Coding Agent Without Losing the Evidence
Last reviewed: 2026-05-28.
Direct answer
A coding agent should not start by guessing at a fix. Give it the failing GitHub Actions run, the pull request or branch, the exact job and step that failed, and the repository instructions it must follow. Ask for a narrow triage report first: failure summary, files likely involved, reproduction command if available, smallest safe patch, and the CI evidence it used. Only then ask for a code change and a rerun plan.
For repair work, pair this workflow with Test Repair and Pull Request Workflow for Coding Agents so the agent’s patch, test command, and human handoff stay connected.
For broader release checks, see AI Coding Agent Setup, Security, and Model Routing .
Who this is for
Use this when a coding agent is helping with GitHub Actions failures on a pull request, especially when the failure involves tests, linting, formatting, type checks, or generated files. It is also useful for teams that rely on repository instruction files or memory files to keep agents inside project-specific rules.
Do not use it as a substitute for checking the live CI provider, the current pull request state, or the current repository instructions. Those details can change after an agent reads them.
Key takeaways
- Start with evidence capture: workflow name, run link, job, step, exit signal, changed files, and the pull request context.
- Separate diagnosis from repair. The first response should explain the failure and propose a minimal plan before editing files.
- Make the agent restate the repository instructions or memory scope it used, then compare that against the current source files.
- Rerun the smallest relevant check first. Do not treat one local pass as proof that every CI workflow is fixed.
- Keep a short pass/fail record so a reviewer can see what changed, what was tested, and what still needs human judgment.
Smoke-test workflow
Setup assumptions:
- The operator can open the GitHub Actions run and the related pull request or branch.
- The repository has current project instructions available to the coding agent.
- The agent is allowed to inspect the relevant files and propose a patch, but it must not read secrets or change unrelated code.
Happy-path request plan:
- Provide placeholders only: , , , , , and .
- Ask the agent to summarize the failure, map it to changed files, name the likely failing command, and propose the smallest repair.
- After the triage report, approve a patch only if it cites the failed job and explains why the changed files are related.
- Run the smallest relevant check locally or in CI, then update the pull request handoff.
Error-path check:
- Give the agent a deliberately incomplete CI reference, such as a run link without the failed job name. A safe agent response should ask for the missing job or step instead of inventing an error.
Minimum assertions:
- The report names the workflow, job, and failed step.
- The proposed patch is limited to files related to the failure.
- The handoff records the command or CI rerun used after the patch.
- The agent does not expose credentials, hidden environment values, or full private logs.
What the smoke test must not assert:
- It must not claim that unrelated workflows pass.
- It must not claim a flaky failure is permanently fixed from one run.
- It must not claim exact tool behavior that is not visible in the linked documentation or current CI evidence.
Sanitized log-record template:
triage_id: <ci-triage-id>
repository: <owner/repo>
branch_or_pr: <branch-or-pr-placeholder>
ci_run: <ci-run-url>
workflow: <workflow-name>
job: <job-name>
failed_step: <failed-step-name>
suspected_cause: <short-neutral-summary>
files_changed: <relative-path-list>
check_run_after_patch: <command-or-ci-rerun>
result: <pass|fail|needs-human-review>
remaining_risk: <short-note>
reviewer_handoff: <link-or-placeholder>
Failure modes
- Evidence gap: the agent cannot inspect the failing log, source page, pull request, or local command output. The safe action is to stop and record the missing evidence instead of guessing.
- Scope drift: the agent edits files that are not connected to the observed failure. Keep the repair tied to the failing signal and leave unrelated cleanup for a separate task.
- Environment mismatch: the local check uses different versions, credentials, feature flags, or runtime settings than the hosted path. Record the mismatch before treating the result as proof.
- Unreviewed fallback: the agent changes models, endpoints, permissions, or retry behavior to make a run pass without preserving the review boundary. Treat access and provider failures as operational blockers, not topic failures.
- Weak handoff: the final note says the issue is fixed but omits the command, result, changed files, and remaining uncertainty. That makes the next operator repeat the investigation.
Sources checked
- GitHub Actions documentation - accessed 2026-05-28; purpose: verify workflow runs, jobs, steps, checks, and logs.
- GitHub pull requests documentation - accessed 2026-05-28; purpose: verify pull request review and collaboration boundaries.
- OpenAI Codex AGENTS.md guidance - accessed 2026-05-28; purpose: verify repository instruction-file context for coding agents.
- Claude Code memory documentation - accessed 2026-05-28; purpose: verify project memory and instruction-file context for agent workflows.
Contract details to verify
| Area | What to verify | Source URL | Accessed | Safe candidate wording |
|---|---|---|---|---|
| CI workflow evidence | Confirm the current workflow, run, job, step, and log fields before asking an agent to diagnose a failure. | https://docs.github.com/en/actions | 2026-05-28 | Name the failing workflow, job, step, and run link before requesting a patch. |
| Pull request handoff | Confirm the current pull request state, review comments, checks, and merge expectations. | https://docs.github.com/en/pull-requests | 2026-05-28 | The repair handoff should state what changed, what was checked, and what still needs review. |
| Repository instructions | Confirm which instruction file applies to the files being changed and whether nested instructions affect the task. | https://github.com/openai/codex/blob/main/docs/agents_md.md | 2026-05-28 | The agent should restate the applicable repository rules before editing. |
| Agent memory scope | Confirm which memory or project-context files are loaded for the session. | https://code.claude.com/docs/en/memory | 2026-05-28 | Treat remembered context as guidance to verify against current files and docs. |
FAQ
Should the coding agent fix the CI failure immediately?
Not first. Ask for a triage report before edits. The report should identify the failing workflow, job, step, related files, and a minimal repair plan.
What if the CI logs are too long?
Give the agent the failed job and step, plus the smallest relevant excerpt. If the excerpt omits necessary context, the agent should ask for the missing section instead of guessing.
What if the failure is flaky?
Record the evidence and rerun history separately from the patch. A single pass can support a handoff, but it should not be described as proof that the flaky condition is gone.
How should this connect to model access?
If your agents call external models during triage, keep model access separate from the CI evidence and repository rules. Teams comparing a gateway for agent workloads can Start with CometAPI.
Reader next step
Turn the next coding-agent request into a one-page task brief, then compare it with AI Coding Agent Setup, Security, and Model Routing . For the surrounding setup and permission baseline, review Route Coding Agent Model Calls Without Endpoint Drift before assigning broader repository work.