Last reviewed: 2026-05-14.

Who this is for

This workflow is for teams that want a coding agent to repair tests, fix a small bug, or prepare a pull request without handing the agent merge authority. It works best when there is a concrete verifier: a failing test, a build command, a smoke check, or a reproducible error.

“Fix the tests” is not a verifier. “Make python3 scripts/check_site_units.py pass without changing runtime behavior” is a verifier. The agent can reason against the second target because it has a command, a scope, and a success condition.

Key takeaways

  • Start with the failing command, not a vague repair request.
  • Require diagnosis before edits.
  • Keep the patch scoped to the behavior under test.
  • Run the same check after the fix.
  • Run adjacent checks when the touched code is shared.
  • Make the PR summary name files changed, checks run, risks, and follow-up work.
  • Do not let the agent merge, deploy, buy, delete, or approve production changes by default.

The repair loop

The basic loop is short:

  1. Reproduce or inspect the failing check.
  2. Explain the likely cause in one paragraph.
  3. Patch only the owned file set.
  4. Run the same check again.
  5. Run adjacent checks if the patch touched shared behavior.
  6. Prepare a PR-ready summary.

That loop is deliberately conservative. It keeps the agent anchored to observable behavior and makes the final review easier for a human.

Task shape that works

Weak taskBetter task
Fix CI.Inspect the failing GitHub Actions job, identify the first failing command, and patch only the smallest file set needed.
Make tests pass.Make npm test -- auth.spec.ts pass; do not rewrite unrelated tests.
Clean up the repo.Remove the unused import introduced by this patch and run git diff --check.
Improve content quality.Add a static content gate that fails on broken sources, thin public posts, missing internal links, and missing UTM metadata.

The better tasks all have a target and a boundary. A senior reviewer should be able to tell whether the agent succeeded without reading a long narrative.

Example prompt

Run the failing check and summarize the first failure.
Then make the smallest patch that fixes that failure.
Do not change unrelated formatting.
After the patch, rerun the same check and report the output summary.

For a repository with strict agent rules, add:

Read AGENTS.md first.
Protect unrelated user changes.
Use apply_patch for manual edits.
Do not stage or commit until checks pass.

The exact command depends on the project, but the structure stays stable: reproduce, diagnose, patch, verify, summarize.

Diagnosis before edits

Ask the agent for a short diagnosis before it writes. The diagnosis should name:

  • the failing command or file;
  • the expected behavior;
  • the actual behavior;
  • the likely cause;
  • the intended patch scope.

If the diagnosis is hand-wavy, the edit will probably be hand-wavy too. Stop and narrow the task before the agent changes files.

Patch scope rules

Use these boundaries unless the maintainer says otherwise:

RuleRationale
Touch only files connected to the failurePrevents opportunistic refactors.
Preserve existing styleReduces review noise.
Remove only unused code introduced by the patchAvoids deleting unrelated stale code.
Keep generated files out unless the task requires themPrevents build artifacts from hiding behavior changes.
Report skipped checksMakes residual risk visible.

The agent can suggest unrelated cleanup in the final note. It should not do it during a repair task.

PR-ready summary

A useful PR summary is compact and evidence-backed:

Changed:
- Added deterministic validation for static canary content before build.
- Replaced two thin public briefs with aliases to stronger pillar pages.

Checks:
- python3 scripts/check_static_content_quality.py --site-id coding-agent-guide
- python3 scripts/build_static_site.py --site-id coding-agent-guide
- python3 scripts/check_site_units.py

Risk:
- Plausible and Search Console remain blocked on account access; status docs record that blocker.

This format is better than “fixed issue” because it tells the reviewer what moved and how it was verified.

Human gates

Keep these actions outside the agent’s default authority:

  • merging its own PR;
  • pushing to protected branches;
  • deploying production;
  • changing DNS, registrar, or billing settings;
  • deleting live content;
  • submitting forms on external sites;
  • approving analytics, Search Console, or advertising integrations that need account access.

The agent can prepare the patch and document the next step. Humans or CI policy should own externally visible approval.

Failure modes to catch

Failure modeSymptomGate
Test expectation is staleAgent changes product code to satisfy a bad testRequire diagnosis and product-behavior confirmation.
Fix is too broadDiff touches unrelated modulesEnforce file ownership and review git diff --stat.
Check was not rerunSummary says “should pass”Require exact commands and results.
CI-only failure remainsLocal test passes but CI matrix failsInspect Actions logs and environment differences.
Content gate is bypassedStatic canary builds without source/quality checksWire quality checks into build commands.

GitHub’s pull request and Actions documentation cover the review and CI surfaces, but the repository still has to define what counts as an acceptable agent handoff.

Reader next step

Use this workflow after the setup and model-routing guide and the repository context guide. Then record the final check results in the PR or execution log.

The core CTA target remains blocked until approved. Until then, the safe reader action is internal: review the editorial note and keep production writes behind human approval.

Sources checked