TIL: SPINE—the repo contract—and an author‑first agentic bootstrap

Published

October 16, 2025

# path: content/til/2025-10-16-spine-protocol-author-first-mcp.qmd

title: “TIL: SPINE—the repo contract—and an author‑first agentic bootstrap” date: “2025-10-16” summary: “We reframed repo automation as a SPINE contract and swapped brittle heuristics for an author‑first, MCP‑style, provenance‑safe bootstrap.” tags: [til, agentspine, mcp] project: AgentSpine links: - https://github.com/WujiangXu/A-mem-sys - https://arxiv.org/pdf/2502.12110 bibliography: 2025-10-16-spine-protocol-author-first-mcp.bib — ## Why this mattered Our early Stage‑2 relied on heuristic scores (titles/docs/imports) to map code Π → clusters Φ. In a real repo those signals were noisy: headings were decorative, cluster titles didn’t match components, and import tokens overlapped poorly. Scores hovered near zero and the “fix” looked like cargo‑cult renames. We needed a standard that agents and humans could both target—and tools that write the truth deterministically.

The idea that unlocked it: SPINE

SPINE = Structured Protocol for In‑repo Navigation & Enforcement. It separates what the repo promises from how we render or host it:

SPINE‑M (Manifest). A typed, attributed multigraph: Π code, Φ concept clusters, P papers, W waves, ρ references; edges: maps_to, cites, refers, informs; plus deltas and deltas_by_cluster.
SPINE‑A (Acceptance). One JSON gate with KPIs and advisory→blocking flips. AND‑semantics across departments yields a single deploy decision.
SPINE‑IO (Agent I/O). Minimal, schema‑validated lanes (plan_card, pr_report, research_note) with foreign keys to Π/Φ/P/W/ρ.
SPINE‑D (Determinism). Reproducible build levels (D0–D3) and probes; artifacts must be stable under fixed env.

This reframes repo work: agents don’t “guess,” they satisfy a contract.

Naming & positioning

AgentSpine is the toolkit; SPINE is the open spec.
It complements agent memory systems (e.g., A‑MEM) rather than replacing them: memory captures evolving observations; SPINE governs pass/fail at merge time (n.d.a, n.d.b).

Provenance first: SPE, stream, and heads

Every write carries a Spine Provenance Envelope (SPE) with: rev, prev_rev (optimistic concurrency), idempotency_key, schema_hash, trace_id, and ts.
We append events to log/stream.ndjson and maintain read pointers in state/heads/* and state/acceptance_latest.json.
Strict success/error split: success returns typed result.content; errors return tool_error{code,message,data}. Result: no stale overwrites, idempotent retries, audit by default.

Stage‑2 pivots: author‑first over heuristics

Instead of “infer everything,” we: 1) Author structure explicitly: create/rename Φ with short, token‑friendly titles (Apps, CLI, Data, Tools, Notebooks, Meta…), then set Π→Φ maps by intent. 2) Use mechanical context to see the repo, not to decide for us: roots, imports, headings, references, tokens. 3) Open Needs for gaps (e.g., “Φ03 missing cites(ρ)”) and let agents or humans resolve them deterministically. 4) Run acceptance to verify KPIs and surface orphans/broken links/determinism.

Minimal flow:

# 1) Read-only context
spine ctx roots && spine ctx headings && spine ctx imports --lang py

# 2) Author explicit structure
spine cluster new --id Φ01 --title "Apps" --order 1
spine map set --pi Π00 --phi Φ01
spine route add --phi Φ01 --canonical /kb/clusters/apps/

# 3) Track gaps and verify
spine need open --node Φ03 --missing cites --ask "Provide 1–3 bib entries for Data"
spine manifest && spine acceptance --kpis-only

Departments: knowledge + entropy (with optional packs)

knowledge (SPINE‑M): registries (ids.yml, routes.yml, xref.yml), cluster pages, manifest build, Needs queue.
entropy (SPINE‑A/D): root‑cap, path fences, canonical import guards, determinism probes (diffoscope/reprotest). Optional packs (advisory first): testops (coverage trend), sec (secrets/SAST), supply (SBOM/CVEs). Acceptance aggregates per‑dept status into a single decision.

Mechanical context (fast, deterministic)

spine ctx roots → top‑level roots + language hints.
spine ctx imports --lang py|js|java → light import graph per stack.
spine ctx headings --glob 'README.md docs/**/*.md' → headings + computed anchors (GitHub/Quarto rules).
spine ctx bib --glob 'refs*.bib references/**/*.bib CITATION.cff' → parsed bib/cff refs (keys, title, doi/url).
spine ctx tokens --n 30 → salient path/file tokens per root. These pack context for LLMs and humans without making decisions.

Deterministic writers (SPE‑guarded, conflict‑safe)

spine cluster new --id Φ07 --title "CLI" --order 2
spine map set --pi Π02 --phi Φ03
spine route add --phi Φ03 --canonical /kb/clusters/data/
spine xref add --from Φ02#intro --to Φ01#overview
spine agentio upsert --kind plan_card --file plan.json --prev <rev> Each write is idempotent and refuses stale prev_rev.

Acceptance as governance (advisory → blocking)

Acceptance JSON is the one artifact everyone (and every agent) aims to make green. Typical KPIs:

Structure & hygiene: map_coverage, ready_Pi, ready_Phi, orphans, broken_links.
Determinism: D (PASS/FAIL), with probes wired to builds.
Throughput/quality (optional): idea adoption, plan coverage, decay compliance (IO lane). Promotion flips only after a green streak—no “big bang” regressions.

A tiny acceptance excerpt:

{
  "contract": "SPINE-A",
  "contractVersion": "0.1.0",
  "summary": { "pass": false, "why": ["orphans>0","needs_open>0"] },
  "kpis": {
    "map_coverage": 0.67,
    "ready_Pi": 0.50,
    "ready_Phi": 0.33,
    "orphans": { "Pi": 1, "Phi": 2 },
    "broken_links": 0,
    "D": "PASS",
    "needs_open": 3
  },
  "steps": [
    {"cmd":"spine manifest","rc":0},
    {"cmd":"spine acceptance --kpis-only","rc":0}
  ]
}

Needs queue & lifecycles (closing gaps without guesswork)

Per type, we keep small, explicit lifecycles:

Π (code): seeded → mapped(Φ) → docs_linked? → ready
Φ (cluster): seeded → primary_page → chain order → cites(ρ) → ready
P (paper): seeded → refers(Φ) → cites(ρ) → ready
W (wave): seeded → informs(Φ) → ready
ρ (ref): seeded → resolvable(path|doi|bibkey) → ready If a field is missing, we open a SPE Need: node, missing[], suggested_*, ask[], status. This keeps follow‑ups deterministic and auditable.

Plugins for breadth, not brittle magic

We standardize plugin outputs so stacks can evolve independently:

Import graphs: Py (AST/pydeps), JS/TS (madge/depcruise), Java (jdeps → DOT → JSON). Output: {module/pkg/file, deps[]}.
Anchors & link health: compute GitHub/Quarto slugs and report curated xref failures; encourage explicit {#id} for stability.
References: parse .bib/.ris/CITATION.cff; emit minimal {key,title?,doi?,url?,path}. Plugins inform authors; they aren’t authorities. Writes remain author‑intent + SPE.

Entropy gates & determinism levels (D0–D3)

Root‑cap & path fences: shrink the search space; block dangerous edits.
Canonical imports: enforce regular import forms (per language) to enable codemods and reduce cycles.
Determinism: A/B builds must match under fixed env (SOURCE_DATE_EPOCH, TZ=UTC, stable writers); escalate to reprotest and diffoscope where warranted. Levels: D0 no guarantees → D1 stable docs/assets → D2 reproducible libraries/apps → D3 attestation + provenance.

MCP parity & interop

We mirror CLI tools as MCP tools:

spine.manifest.read({select?}) → {content, meta}
spine.acceptance.read({kpisOnly?}) → {content, meta}
spine.agentio.upsert({kind,id?,prev_rev?,payload}) → {content:SPE, meta}
spine.stream.tail({after_ts?,limit?}) → {content:SPE[], meta} Strict schemas, idempotency, and success/error separation make AgentSpine easy to plug into LangGraph, OpenHands, or RepoAgent orchestrators. Memory systems (e.g., A‑MEM) can enrich context; SPINE stays the merge gate (n.d.a, n.d.b).

Repo‑agent vs “AGENTS.md + contracts”

Guidelines are advisory; contracts are passive. A repo‑agent with contract injection closes the loop: read SPINE‑M/A, propose a plan_card, apply deterministic repairs (routes/xrefs/anchors/import rules), re‑check acceptance, and open a PR with pr_report. Objective = “green acceptance”, not “best effort.”

Rollout notes & pitfalls

Start advisory. Never flip to blocking until you’ve had a green streak; publish promotion dates.
Author‑first titles. Short titles (“Apps”, “CLI”, “Data”… ) beat clever scores—better retrieval, less churn.
No destructive edits. Stage‑2 never moves user code; all changes live under agentspine/.
Explicit anchors. Freeze important headings with {#id}; keep curated xrefs green.
SPE everywhere. If a write can’t explain prev_rev, it shouldn’t land.

What changed for me

I stopped trying to coerce titles into pleasing a score.
I treat the repo as a contract and use mechanical tools to write facts with provenance.
The result is faster onboarding, fewer regressions, and a single artifact—Acceptance—that everyone (humans and agents) can aim to make green.

References

n.d.a. https://github.com/WujiangXu/A-mem-sys.

———. n.d.b. https://arxiv.org/pdf/2502.12110.