Last reviewed: 2026-06-05
Direct answer
When a coding agent rewrites files, the safest rollback evidence is a combination of a dedicated branch, a pull request with a complete diff, CI run artifacts, and a worktree snapshot taken before the agent starts work. Together these give you three independent recovery paths: reset the branch tip, revert via a PR, or restore a worktree checkout.
The core workflow is:
- Before the agent runs, create an isolated branch from the current trunk commit.
- Optionally add a git worktree checkout so the pre-change state lives in a separate directory without disturbing the main working tree.
- Let the agent commit all changes to that branch — never let it commit directly to the default branch.
- Open a pull request immediately after the agent finishes so the full diff, author metadata, and CI results are captured as a durable audit record.
- Keep the CI run log and any test-output artifacts attached to the PR; these are the evidence that the change was safe or unsafe at a specific point in time.
- Gate merge on human review and a passing CI run — if either fails, the branch is already the rollback target.
If you need to revert after merge, a single git revert on the default branch produces a new commit that undoes the agent’s changes while preserving the audit trail.
For broader release checks, see AI Coding Agent Setup, Security, and Model Routing .
Who this is for
This guide is for engineers and platform teams who run coding agents — such as Codex, Claude Code, or similar tools — on production or pre-production repositories and need confidence that any change can be safely undone without data loss or history rewriting.
It assumes:
- You use Git for source control.
- Your repository is hosted on GitHub or a GitHub-compatible platform.
- You have basic familiarity with pull requests and CI pipelines.
If you are new to running agents in repositories, How to Write Repository Instructions for Coding Agents covers the instruction-file setup that constrains what an agent can modify.
Key takeaways
- Branch isolation is mandatory. An agent that commits to the default branch can create an irrecoverable state if it writes over files not tracked by the current diff. Always route agent commits to a named branch.
- Pull requests are the durable audit record. A merged or open PR preserves the diff, the author, the commit timestamps, the CI status, and the reviewer decisions. Closing the PR does not destroy this record.
- Worktrees give you a pre-change snapshot without a stash. git worktree add lets you keep the pre-agent state checked out in a separate directory while the agent works in the main tree. Verify each step independently via the linked docs before applying it to your own workflow.
- CI artifacts are time-stamped evidence. A CI run log attached to a PR shows exactly what tests passed or failed at the moment the agent’s code was evaluated. Retain these logs at least until the change has been in production for one full release cycle.
- git revert is safer than git reset after merge. Reset rewrites history; revert adds a new commit that undoes changes while keeping the full audit trail intact.
- Keep AGENTS.md or equivalent instruction files under version control. The instruction file that governed agent behavior is itself part of the rollback evidence — if the file changes between runs, the old version should be recoverable from Git history.
Smoke-test workflow
The following describes a plan for verifying your rollback-evidence setup. Adapt it to your own toolchain. Exact command flags and CI configuration syntax must be verified against the linked sources for your specific Git version and CI provider.
Setup assumptions:
- A repository with at least one committed file and a passing CI pipeline.
- Git version that supports git worktree (available since Git 2.5; see the official git-worktree reference for your exact version’s behavior).
- A GitHub repository with pull requests and Actions enabled.
Happy-path plan:
- From the default branch, create a new branch: git checkout -b agent/smoke-test-$(date +%Y%m%d).
- Optionally add a worktree: git worktree add ../pre-agent-snapshot HEAD — this preserves the pre-change state in a sibling directory.
- Make a small, controlled change to a non-critical file (e.g., add a comment to a README).
- Commit the change with a message that identifies it as an agent-generated test commit.
- Push the branch and open a pull request against the default branch. Verify the PR shows a diff, a CI run is triggered, and the commit author is logged.
- Confirm the CI run completes and its log is accessible from the PR.
- Without merging, run git revert HEAD on the agent branch and confirm the revert commit appears in the PR timeline.
Error-path check:
- Force a CI failure (e.g., add a syntax error to a test file) in a separate commit on the same branch.
- Confirm the PR shows a failed CI status and that the failure log is attached to the run.
- Confirm that merging is blocked by branch protection rules (if configured).
Minimum assertions:
- The PR diff matches exactly the files the simulated agent modified.
- The CI run log is accessible and time-stamped.
- The worktree directory contains the pre-change state of every modified file.
- The revert commit appears in the Git log and the PR timeline.
Pass/fail log fields to record:
branch_name: <agent/smoke-test-YYYYMMDD> pr_url: https://github.com/your-org/your-repo/pull/NNN ci_run_id: ci_status: <pass | fail> worktree_path: <../pre-agent-snapshot> revert_commit_sha: <first 8 chars> reviewer_decision: <approved | changes-requested | not-reviewed> log_retained_until:
What this smoke test must not assert:
- Do not assert specific model names, token counts, or API response times.
- Do not assert that a specific CI provider or branch protection configuration is universally available; verify your platform’s current docs.
- Do not assert that git worktree behavior is identical across all Git versions; check the git-worktree reference for your installed version.
Why rollback evidence matters for agent-generated changes
Coding agents can make high-velocity changes across many files simultaneously. Unlike a human developer who typically edits a few files at a time, an agent may touch dozens of files in a single task. This creates two risks that manual rollback strategies do not fully address:
Risk 1 — Scope creep in diffs. An agent following broad instructions may modify files outside the intended scope. A PR diff surface exactly which files were touched, giving reviewers a complete picture before merge. See GitHub’s pull requests documentation for how diff views work in practice.
Risk 2 — Hard-to-audit instruction drift. If the agent’s instruction file (AGENTS.md, CLAUDE.md, or an equivalent) is not version-controlled alongside the code, you lose the ability to reproduce the exact conditions under which a change was made. The OpenAI Codex AGENTS.md guidance covers what these files typically contain; keep them under Git so they travel with the code history.
Branch and worktree strategies
Three patterns cover most agent workflows:
Pattern A — Simple feature branch (lowest overhead). Create a branch per agent task. The branch name should encode the task or date for traceability. Merge only via PR. This is the minimum viable rollback setup.
Pattern B — Worktree snapshot (pre-change preservation). Before the agent starts, add a Git worktree at the current HEAD. The worktree directory is an independent working tree that shares the same object store but has its own index and HEAD. If the agent’s run produces an unrecoverable working-tree state, the worktree gives you a clean checkout to diff against. See the git-worktree reference for exact add, remove, and prune semantics — verify them for your Git version before scripting them.
Pattern C — CI-gated merge (strongest evidence trail). Combine Pattern A with required status checks on the PR. GitHub Actions can run tests, linters, and custom validation scripts that produce time-stamped logs attached to the PR. Make at least one CI job required for merge so that the PR cannot be approved without a recorded test result.
For teams running multiple agents in parallel, consider combining Pattern B and Pattern C: one worktree per agent, all PRs targeting the same integration branch with required CI. The parallel worktrees guide covers isolation rules in more detail.
What to retain and for how long
The following table describes the artifacts that constitute rollback evidence and a minimum retention guidance. Retention periods should be confirmed with your organization’s own compliance and incident-response policies.
| Artifact | What it captures | Minimum retention guidance |
|---|---|---|
| Agent branch | All agent commits, pre-merge state | Until one full release cycle after merge |
| Pull request record | Diff, author, timestamps, reviewer decisions | Permanently (most platforms retain by default) |
| CI run log | Test results at merge time | Until two release cycles after merge |
| Worktree snapshot | Pre-change working tree state | Until PR is merged and verified healthy |
| Instruction file (AGENTS.md) | Agent behavior contract | Permanently, under version control |
Using a model API gateway for agent review tasks
If your team uses a model API gateway to route agent review or validation calls — for example, using CometAPI to send pre-merge code summaries to a language model — those gateway calls can also produce durable log records. A gateway that logs request and response metadata gives you a time-stamped evidence trail that sits alongside your CI logs.
Details about the gateway endpoints, authentication scheme, and log retention behavior must be verified in the CometAPI documentation before building any review automation on top of them. Do not assume endpoint paths or response fields from this article; use the linked docs as the authoritative source.
If you are evaluating a model gateway for this purpose, Start with CometAPI to explore current endpoint options.
Failure modes
- Evidence gap: the agent cannot inspect the failing log, source page, pull request, or local command output. The safe action is to stop and record the missing evidence instead of guessing.
- Scope drift: the agent edits files that are not connected to the observed failure. Keep the repair tied to the failing signal and leave unrelated cleanup for a separate task.
- Environment mismatch: the local check uses different versions, credentials, feature flags, or runtime settings than the hosted path. Record the mismatch before treating the result as proof.
- Unreviewed fallback: the agent changes models, endpoints, permissions, or retry behavior to make a run pass without preserving the review boundary. Treat access and provider failures as operational blockers, not topic failures.
- Weak handoff: the final note says the issue is fixed but omits the command, result, changed files, and remaining uncertainty. That makes the next operator repeat the investigation.
Sources checked
- GitHub pull requests documentation - accessed 2026-06-05; purpose: verify pull request review and collaboration boundaries.
- GitHub Actions documentation - accessed 2026-06-05; purpose: verify workflow runs, jobs, steps, checks, and logs.
- OpenAI Codex AGENTS.md guidance - accessed 2026-06-05; purpose: verify repository instruction-file context for coding agents.
- Git worktree documentation - accessed 2026-06-05; purpose: verify parallel worktree isolation commands and constraints.
Contract details to verify
| Area | What to verify | Source URL | Accessed | Safe candidate wording |
|---|---|---|---|---|
| PR diff completeness | Confirm all agent-modified files appear in the PR diff view, including untracked files and deletions | https://docs.github.com/en/pull-requests | 2026-06-05 | “A PR diff surfaces all tracked file changes made in the branch” |
| CI log retention period | Confirm how long GitHub Actions run logs are retained for your plan tier | https://docs.github.com/en/actions | 2026-06-05 | “CI run logs are retained for a platform-defined period; check your plan for exact limits” |
| Required status checks behavior | Confirm that required status checks actually block merge when CI fails, including admin override settings | https://docs.github.com/en/actions | 2026-06-05 | “Required status checks can block merge; verify branch protection rules for your repo” |
| AGENTS.md instruction scope | Confirm exactly which file names and paths Codex and other agents check for repository instructions | https://github.com/openai/codex/blob/main/docs/agents_md.md | 2026-06-05 | “AGENTS.md at the repository root is the standard instruction file; verify per-tool docs for your specific agent” |
| CometAPI gateway log fields | Confirm which request and response metadata fields are logged and retained by the gateway | https://apidoc.cometapi.com/ | 2026-06-05 | “Gateway log fields and retention behavior must be verified in CometAPI docs before building review automation” |
FAQ
Q: Can I use git stash instead of a worktree for pre-change preservation? A: A stash saves uncommitted changes to the working tree only. If the agent starts from a clean working tree (which is the recommended approach), there is nothing to stash before it begins. A worktree at the current HEAD is more reliable because it creates an independent checkout that persists even if you run other Git commands in the main tree during the agent’s run.
Q: What if the agent commits directly to the default branch by mistake? A: The safest recovery is git revert rather than git reset. Revert adds a new commit that undoes the unwanted changes while keeping the full history intact. If branch protection rules were not in place, check the GitHub pull requests docs for how to enable required reviews and status checks to prevent direct commits in future runs.
Q: How granular should agent commits be? A: Prefer one commit per logical change rather than one large commit per agent task. Granular commits make each change independently revertible and make the PR diff easier to review. Some agents can be configured via their instruction file to commit incrementally; verify this in your agent’s documentation.
Q: Does opening a PR after an agent run slow down the workflow? A: The PR serves as the evidence record, not necessarily a blocking gate. You can open a draft PR immediately after the agent finishes, capture the diff and CI results, and only convert it to a ready-for-review PR when a human is available. This keeps the evidence intact without blocking automated follow-up steps.
Q: Should the AGENTS.md file itself be protected from agent modification? A: Yes. If an agent can modify the instruction file that governs its own behavior, the audit trail for that session becomes unreliable. Consider adding the instruction file path to your repository’s code-owner rules or branch protection patterns so changes to it always require explicit human approval.
Q: What is the minimum rollback setup if I cannot use CI? A: At minimum, require that the agent works on an isolated branch and that a pull request is opened before any merge. Even without CI, the PR diff and commit history give you a manual rollback path: close the PR without merging, and the default branch is unchanged.
Reader next step
Turn the next coding-agent request into a one-page task brief, then compare it with How to Write Repository Instructions for Coding Agents . For the surrounding setup and permission baseline, review AI Coding Agent Setup, Security, and Model Routing before assigning broader repository work.