← Back to .knowledge

// Blog / orientation cost

How to reduce repeated repo crawls by coding agents

The point is not to make agents read less source. The point is to stop spending context on the same broad orientation crawl every session.

// direct answer

Short answer

Repeated repo crawls are reduced by replacing the broad first pass with a maintained routing bundle, then spending the saved attention on current source, tests, evidence, and any stale or suspect areas.

Reduce cold-start orientation work while keeping source review strict where it matters.

// real failure mode

The failure mode

Every new session repeats the same repo walk: README, package files, source tree, tests, docs, and old summaries.

That crawl is sometimes useful, but it is also noisy. It does not tell the agent which notes are stale or which module summaries are only heuristic.

The waste is not only tokens. It is attention spent rediscovering project shape instead of checking the code path that matters.

// repo-local model

The repo-local fix

Replace the broad first pass with a routing bundle as the first operational read.

Use the bundle to find target modules, trust status, critical files, and maintenance reports.

Then spend the source-reading budget where it matters: current code, tests, evidence, and any stale or suspect area.

// concrete example

A cautious measurement method

Measure the baseline cold-start path first: the files an agent tends to read before it knows where the task belongs.

Then compare it with a route-first path: routing_bundle.json, the relevant module card, current source/tests, and evidence for the task.

Keep the claim narrow. A local smoke estimate can show orientation reduction, but it should not be sold as universal token savings for every repository.

// repo proof

Repo proof to inspect

Route-first entrypoint

.knowledge/maintenance/routing_bundle.json

The bundle replaces an aimless first crawl with a smaller first-read path.

Use it for
  • Choosing module and source paths
  • Finding stale and suspect knowledge
Do not use it for
  • Avoiding source review
  • Claiming exact token savings
Local metrics

.knowledge/metrics/

Metrics make orientation claims inspectable instead of anecdotal.

Use it for
  • Comparing local runs
  • Tracking health and context estimates
Do not use it for
  • Publishing universal benchmarks
  • Comparing unrelated repos without caveats
Benchmark notes

.knowledge/docs/metrics-benchmarks.md

Benchmark docs explain the limits of the estimator and the scope of the result.

Use it for
  • Documenting methodology
  • Keeping claims cautious
Do not use it for
  • Guaranteeing production savings
PR summary

.knowledge/maintenance/pr_summary.md

The PR summary turns trust and repair state into reviewable output.

Use it for
  • Review handoff
  • Checking what changed since the last run
Do not use it for
  • Replacing reviewer judgment

// command transcript

Commands and expected checks

node .knowledge/tools/collect-metrics.js
expected
Local health, token estimates, file counts, and graph metrics are collected.
inspect next
.knowledge/metrics/ and .knowledge/docs/metrics-benchmarks.md
caution
Treat estimates as local smoke data unless validated with a tokenizer-specific benchmark.
node .knowledge/tools/flow.js release --no-color
expected
The route-first artifacts are rebuilt and checked together.
inspect next
.knowledge/maintenance/quality_report.json
Before / after repo-local proof

First-orientation path

01 Before: README -> manifests -> source tree -> tests -> docs -> old summaries
02 After: routing_bundle.json -> target module -> source/tests -> evidence
03 Synthetic SaaS-shape fixture: 14 orientation files -> 1 routing bundle
04 Published smoke result: about 22% estimated context reduction with one local estimator

// field guide

Measurement guardrails

Use metrics to improve workflow, not to overclaim precision.
MetricUse it forDo not claim
Orientation file countCompare broad crawl vs route-first path.Exact effort saved for every agent.
Estimated contextSpot rough direction and regression risk.Tokenizer-verified universal savings.
Doctor scoreCheck knowledge health before handoff.Proof the product code is correct.

// guardrails

What the agent should not trust blindly

  • Use the published numbers as order-of-magnitude smoke data, not production benchmarks.
  • Do not claim universal token savings. Tiny repositories can show overhead because the routing bundle has fixed structure cost.
  • Do not claim correctness from routing. Routing reduces aimless orientation; source and tests still decide behavior.

// common mistakes

Common mistakes

  • Measuring only tokens and ignoring whether the agent read the right source files.
  • Comparing one repository's result to another without matching size, language, and task type.
  • Optimizing away the source read that prevents wrong-file edits.

// quick FAQ

FAQ

Does .knowledge guarantee lower token use?

No. It is designed to reduce aimless first-orientation work. Actual token impact depends on repository size, task type, model behavior, and how much source must be re-read.

Should an agent read fewer tests after routing is added?

No. Routing should help the agent find the right tests faster. It should not reduce test review for behavior-changing work.

// page-specific next step

Use this page when repeated orientation becomes visible cost

Measure the broad crawl, switch to route-first onboarding, and keep the claim narrow: less aimless orientation, more targeted source review.