How Agent Guides Fail Without Source Backing: A Field-Level Breakdown

Last reviewed: 2026-06-10

Direct answer

A coding agent guide fails without source backing when the instructions it contains cannot be traced to a current, reachable reference. The agent then behaves as if the guidance is authoritative even when it is stale, incomplete, or simply wrong. The result is not always a loud crash — more often it is silent drift: the agent follows the rule, the rule was never correct for this environment, and the problem surfaces three PRs later as a failing CI step or a permission error that nobody expected.

The failure modes fall into four families:

Stale rule propagation — AGENTS.md, CLAUDE.md, or equivalent files contain commands, tool names, or workflow steps that no longer match the actual codebase, CI pipeline, or API surface.
Memory file divergence — Instruction files embedded in the project (such as Claude Code memory files or Codex cloud task context) drift out of sync with code because they are edited manually rather than derived from tested, versioned sources.
Unverifiable CI assertions — The guide describes what a CI pass looks like but does not link to the actual workflow definition, so the agent interprets a different failure as a pass or skips a required gate entirely.
PR handoff gaps — The guide tells the agent when to open a pull request but omits acceptance criteria or reviewer assignment rules, so the PR lands without the checks a human reviewer expects.

Each of these failure modes has a detectable signature, a root cause in the instruction authoring process, and a concrete remediation step. The sections below walk through each one.

For broader release checks, see When to Stop, Retry, or Escalate: A Practical Guide to Coding Agent Task Control .

Who this is for

This guide is for engineers who maintain AGENTS.md, CLAUDE.md, or equivalent repository instruction files for coding agents, and for teams that have already deployed an agent (Codex, Claude Code, Cursor, or another tool) and are seeing unexpected behavior in CI, PRs, or model calls. It assumes you can read a GitHub Actions workflow file and that you have at least one agent guide already checked in to your repository.

If you are still setting up your first agent guide, start with How to Write Repository Instructions for Coding Agents before returning here.

Key takeaways

An agent guide that cannot be verified against a live, reachable source is a liability, not an asset. Agents trust the guide; they do not independently validate it.
Stale command references in AGENTS.md are the most common single-file failure mode. A command that was correct six months ago may silently succeed with the wrong behavior or silently fail and be retried indefinitely.
Memory file divergence is harder to detect than stale commands because memory files often contain high-level intent rather than exact commands, so the divergence only becomes visible when the agent makes an unexpected architectural choice.
CI assertion gaps cause the most expensive failures because they allow bad code to reach the PR stage before the problem is found.
PR handoff gaps are the most common cause of agents opening reviewable but unmerge-ready pull requests — work that looks done but requires manual triage before it can land.
Every failure mode in this article has a preventable root cause: the guide was written from memory or convention rather than from a source that can be checked and re-checked.

Failure mode 1: Stale rule propagation in AGENTS.md

OpenAI’s Codex AGENTS.md documentation describes instruction files as the primary mechanism by which operators control agent behavior in a repository. The file is read at task start and treated as authoritative. There is no built-in mechanism that warns the agent when a rule in the file conflicts with the current state of the repository — the agent simply follows the rule.

The most common stale-rule pattern involves test commands. A guide written when the project used npm test will cause the agent to run npm test even after the project migrated to vitest run or pnpm test. In many runtimes this produces a non-zero exit code that the agent interprets as a test failure rather than as a missing script, and the agent enters a repair loop trying to fix tests that are not actually broken.

Detection signature: Agent sessions end with repeated test-command retries, or CI jobs fail on a step that the agent’s session log shows as passing locally.

Root cause: The guide was written once and not updated when the toolchain changed. It has no source link that would make staleness visible.

Remediation: For each command in your AGENTS.md, add a comment or inline note that names the file or CI workflow step that the command mirrors. For example: # matches .github/workflows/ci.yml job: test, step: Run unit tests. Then add a periodic check — either a CI step or a manual calendar item — that diffs the commands in AGENTS.md against the actual workflow definition. Exact field names and step IDs in your workflow definition are what to verify; do not rely on memory.

Failure mode 2: Memory file divergence

Claude Code uses project memory files (typically CLAUDE.md at the repository root or in subdirectories) to provide persistent context across sessions. According to the Claude Code memory documentation, these files are read at session start and inform the agent’s understanding of the codebase structure, conventions, and constraints.

Divergence happens when the memory file describes a module boundary, a naming convention, or a dependency relationship that was accurate at authoring time but has since changed. Because memory files tend to be written in natural language rather than as exact commands, the agent does not fail immediately — it makes plausible-looking decisions based on the outdated model, and those decisions accumulate into a structural problem that is expensive to unwind.

Detection signature: The agent introduces code in the right style but in the wrong location, or references a module path or configuration key that no longer exists. The error only surfaces at build or import time.

Root cause: Memory files are maintained by humans as documentation artifacts rather than as tested, sourced specifications. They are updated reactively when something breaks rather than proactively when the codebase changes.

Remediation: Treat memory file sections as claims that must be source-backed. For any section that describes a directory structure, a module boundary, or a dependency, add a comment linking to the relevant source of truth — a package.json, an index.ts barrel file, or a configuration file. When that source changes, the memory file update becomes visible as a required follow-on step rather than an optional documentation improvement.

Failure mode 3: Unverifiable CI assertions

GitHub Actions provides a structured workflow definition format that describes exactly which jobs run, in what order, and under what conditions. When an agent guide describes CI behavior in natural language without linking to the actual workflow file, the agent builds a mental model of CI that may not match the real pipeline.

The failure mode manifests most often when a new required CI check is added to the workflow but the AGENTS.md is not updated. The agent continues to treat the old set of checks as the complete gate, and opens PRs that will be blocked by the new check. From the agent’s perspective the PR is ready; from the CI perspective it is not.

Detection signature: PRs opened by the agent are consistently blocked by a check that is not mentioned in the agent guide. The agent does not retry or adjust because its guide says the PR is complete.

Root cause: The guide describes CI behavior as of a point in time without a mechanism for detecting when the workflow definition changes.

Remediation: Replace natural-language CI descriptions in your agent guide with direct references to the workflow file. Instead of writing “the CI pipeline runs linting, type checking, and unit tests,” write a reference such as: “CI is defined in .github/workflows/ci.yml. Before opening a PR, confirm all jobs in that file are expected to pass. Do not open a PR if any required status check is red.” This turns the agent’s CI understanding from a static claim into a dynamic check against a file it can read.

For more on preserving evidence through CI triage, see Triage CI Failures With a Coding Agent Without Losing the Evidence.

Failure mode 4: PR handoff gaps

GitHub’s pull request documentation describes the review and merge process in terms of assignees, reviewers, required checks, and merge conditions. An agent guide that tells the agent to open a PR but does not specify reviewer assignment rules, required labels, or merge conditions produces PRs that are structurally correct but operationally incomplete.

The most common gap is missing reviewer assignment. The agent opens the PR, sets itself as author, and waits for review. But the repository may require a human reviewer to be assigned before the PR appears in any review queue. The PR sits unreviewed until a human notices it manually.

A second common gap is missing description templates. If the repository has a pull request template that requires a testing section or a linked issue, the agent’s PR will be flagged as incomplete by maintainers even if the code is correct.

Detection signature: Agent-opened PRs accumulate in a draft or open state without reviewer engagement. When a human inspects them, the code changes are often fine but the PR metadata is missing required fields.

Root cause: The guide describes the code-level completion condition (tests pass, linting passes) but not the repository-level completion condition (reviewer assigned, template filled, label applied).

Remediation: Add a PR completion checklist to your AGENTS.md that mirrors your repository’s CODEOWNERS, branch protection rules, and PR template. Verify the checklist against the current GitHub repository settings periodically. Exact reviewer assignment syntax and required label names should be confirmed against the live repository configuration, not from memory.

Smoke-test workflow

Setup assumptions: You have at least one agent guide file (AGENTS.md, CLAUDE.md, or equivalent) checked in to a repository with a GitHub Actions workflow. You have read access to the workflow file and can open a test branch.

Happy-path request plan:

Open the agent guide file and list every command it specifies.
Open the corresponding CI workflow file(s) and list every command or script step.
Diff the two lists. Commands present in the guide but absent from CI are candidates for stale-rule failures.
Open the memory or context file (if present) and list every directory path, module name, or configuration key it references.
Verify each reference against the current file tree. References that do not resolve are candidates for memory divergence failures.
Open the repository’s branch protection settings and PR template. List required checks, required reviewers, and required labels.
Diff the PR completion criteria in the agent guide against step 6. Gaps are candidates for PR handoff failures.

Error-path check: Intentionally open a draft PR from a test branch without completing one of the required steps identified in step 7. Confirm the PR is flagged as incomplete by GitHub’s branch protection. If it is not flagged, the protection rule is not configured correctly — record this as a gap, not as a passing check.

Minimum assertions:

Every command in the agent guide has a matching step in a CI workflow file or a documented reason for the absence.
Every directory path or module reference in memory files resolves in the current file tree.
The PR completion criteria in the agent guide covers all required checks, reviewer requirements, and template fields.

Pass/fail logging fields (record these after each smoke test; use placeholder values only):

smoke_test_date: YYYY-MM-DD guide_file: workflow_file: <path to .github/workflows/*.yml> stale_command_count: unresolvable_memory_refs: pr_criteria_gaps: overall_result: pass | fail notes: <free text, no credentials, no real paths>

What this smoke test must not assert: Do not assert that the agent will behave correctly in all future sessions based on this check alone. Do not assert model availability, latency, or token cost. Do not assert that CI will pass on the next run — only that the guide’s claims are consistent with current files.

Failure modes

Evidence gap: the agent cannot inspect the failing log, source page, pull request, or local command output. The safe action is to stop and record the missing evidence instead of guessing.
Scope drift: the agent edits files that are not connected to the observed failure. Keep the repair tied to the failing signal and leave unrelated cleanup for a separate task.
Environment mismatch: the local check uses different versions, credentials, feature flags, or runtime settings than the hosted path. Record the mismatch before treating the result as proof.
Unreviewed fallback: the agent changes models, endpoints, permissions, or retry behavior to make a run pass without preserving the review boundary. Treat access and provider failures as operational blockers, not topic failures.
Weak handoff: the final note says the issue is fixed but omits the command, result, changed files, and remaining uncertainty. That makes the next operator repeat the investigation.

Sources checked

OpenAI Codex AGENTS.md guidance - accessed 2026-06-10; purpose: verify repository instruction-file context for coding agents.
GitHub Actions documentation - accessed 2026-06-10; purpose: verify workflow runs, jobs, steps, checks, and logs.
GitHub pull requests documentation - accessed 2026-06-10; purpose: verify pull request review and collaboration boundaries.
Claude Code memory documentation - accessed 2026-06-10; purpose: verify project memory and instruction-file context for agent workflows.
GitHub Copilot repository custom instructions - accessed 2026-06-10; purpose: verify repository-level instruction file concepts and current setup path for Copilot agents.
CometAPI chat completions endpoint reference - accessed 2026-06-10; purpose: verify gateway contract areas for agents that route model calls through an API gateway.
CometAPI model overview - accessed 2026-06-10; purpose: verify model routing documentation scope for gateway setup context.

Contract details to verify

Area	What to verify	Source URL	Accessed	Safe candidate wording
AGENTS.md command syntax	Exact syntax for tool allow/deny lists and shell command fields supported by current Codex version	https://github.com/openai/codex/blob/main/docs/agents_md.md	2026-06-10	“Verify current field names and allowed values in the Codex AGENTS.md documentation before authoring allow/deny rules.”
Claude Code memory file locations	Whether project-level and global memory files have changed paths or precedence in current Claude Code releases	https://code.claude.com/docs/en/memory	2026-06-10	“Memory file path and precedence should be confirmed against the current Claude Code memory documentation before deploying.”
GitHub Actions required status checks	How required status checks interact with draft PRs and auto-merge in current GitHub plan tiers	https://docs.github.com/en/actions	2026-06-10	“Required status check behavior depends on repository plan and branch protection configuration; verify against current GitHub documentation.”
PR reviewer assignment API	Whether CODEOWNERS auto-assignment applies to agent-opened PRs under current GitHub permission models	https://docs.github.com/en/pull-requests	2026-06-10	“CODEOWNERS reviewer auto-assignment behavior should be confirmed in current GitHub pull request documentation for your plan and repository visibility.”

Reader next step

Compare the workflow against Start with CometAPI .

Use When to Stop, Retry, or Escalate: A Practical Guide to Coding Agent Task Control as the next comparison point. Keep AI Coding Agent Setup, Security, and Model Routing nearby for setup and permission checks.

FAQ

Q: My AGENTS.md is short and simple. Do these failure modes still apply?

Yes. Shorter guides are less likely to have stale commands simply because there are fewer commands to go stale, but they are more likely to have PR handoff gaps because brevity often means the PR completion criteria were left out. The failure mode that is most dangerous for a short, simple guide is the one that is hardest to see: the guide says nothing about CI, so the agent invents a CI model from context, and that invented model may be wrong.

Q: Should I store the agent guide in the repository or outside it?

Repository storage is the conventional approach for AGENTS.md and CLAUDE.md because it keeps the guide versioned alongside the code it governs. The failure modes described here are not eliminated by repository storage — they are made more detectable, because you can diff the guide against the workflow file in the same repository. External storage introduces an additional sync risk on top of the staleness risk.

Q: How often should I review the agent guide against its sources?

At minimum, review the guide whenever the CI workflow changes, whenever a new required check is added to branch protection, and whenever a major toolchain dependency changes (e.g., test runner, linter, build tool). A practical heuristic is to add a comment to the CI workflow change PR that says “update AGENTS.md if this changes a command the guide mentions” — this makes the review a required step rather than an optional follow-on.

Q: Can a coding agent maintain its own guide?

An agent can propose updates to its guide, but a human must review and merge those updates before they take effect. Allowing an agent to autonomously update the file it uses as its own instruction set creates a feedback loop that is difficult to audit and can amplify errors rather than correct them. For patterns on maintaining reviewable agent outputs, see How to Produce Reviewable Diffs From Coding Agent Sessions.

Q: What is the safest first step if I suspect my agent guide has stale content?

Open the guide and the most recently modified CI workflow file side by side. For every command in the guide, find the corresponding step in the workflow. If a command in the guide does not appear anywhere in the workflow, treat it as a stale-rule candidate and verify whether it belongs before the next agent session runs.

If your team routes agent model calls through an API gateway, Start with CometAPI to verify your gateway configuration is consistent with current endpoint documentation before your agents begin relying on it.