Cost-aware model tiering, Guild Stack

Use this page to predict why a Guild Stack lane runs at cheap, mid, or powerful, where the decision is recorded, and how to override it when the cost or risk posture is wrong.

Try it: /guild:stats shows the tier distribution across recorded runs. The dispatch receipt for an individual lane records the score, chosen tier, and resolved host model.

Model routingsplit_cost_tiering

Stable tier keys

cheaproutine, low-risk work
midmulti-file reasoning
powerfulhigh ambiguity or risk

Per-run resolution

lane scoretask signals
host model idconfigured mapping
resolved settingsfrozen evidence

Tier keys stay portable. Exact model ids belong to host configuration.

Tiers are routing keys. The exact model id is resolved later from the host configuration in .guild/settings.json — model names are never hard-coded here.

Guild Stack separates three decisions that are easy to confuse:

Axis	What it decides	When it resolves
Mode	Which execution backend carries the lane.	Once at run start.
Tier	How much model capability the lane can spend.	Per lane at dispatch.
Host	Which host adapter runs the attempt.	Per dispatch attempt.

Changing a lane to powerful does not change its backend. Adding a host does not redefine the tier ladder. A tier only says what class of model the lane needs; .guild/settings.json maps that tier to an exact model id for each configured host.

Lane-scoring example

Suppose a plan has two lanes:

Lane	Signals	Score posture	Tier
`T1-doc-summary`	Read and summarize one approved doc.	0	`cheap`
`T2-auth-change`	Draft implementation (+1), touches multiple modules, depends on architecture (+1), security-sensitive (+1).	3+ plus blast-radius weight	`powerful`

The first lane can run cheaply because it is pure I/O. The second lane earns a high tier because mistakes would affect correctness and security across multiple modules. The score is deterministic: no LLM decides that the work “feels hard.”

Score bands:

Score	Tier
0	`cheap`
1-2	`mid`
3+	`powerful`

What each tier means

Tier	Current model posture	Typical work
`cheap`	Exact host-configured model id	File read, tokenize, chunk, summarize, classify, tag; pure I/O with low ambiguity.
`mid`	Exact host-configured model id	Draft, reason, plan subtasks, extract relationships across files; the normal implementation and validation budget.
`powerful`	Exact host-configured model id	Architecture decisions, security review, graph schema or topology work, advisor and critic passes; high-stakes and lower frequency.

The map lives in .guild/settings.json under models.tiers as:

{
  "models": {
    "tiers": {
      "cheap": { "<canonical-host-id>": "<model-or-options>" },
      "mid": { "<canonical-host-id>": "<model-or-options>" },
      "powerful": { "<canonical-host-id>": "<model-or-options>" }
    }
  }
}

Each host slot can be a model string or an options object with host-supported reasoning hints. See the configuration reference for the exact accepted shape.

Auto-scoring a lane

The orchestrator computes the score from deterministic signals:

Signal	Score contribution
`workType` verb: read/summarize	0
`workType` verb: draft/extract	+1
`workType` verb: architect/review/schema	+2
Blast-radius, affected file/module count (each unit ≥1)	+ count × `scoreWeights.blastRadius`
Upstream `depends-on:` contract present	+1
Security or correctness sensitivity flag	+1
Prior-attempt escalation on this lane	+1 (sticky for the run)

Score and resolved tier are printed at dispatch. Signal weights are tunable through models.scoreWeights.

Precedence

--model-tier=<tier>   CLI flag           (top, run-level escape hatch)
  > tier: <tier> in plan lane            (per-lane override in .guild/plan/*.md)
    > settings.json models:              (repo config)
      > built-in default                 (cheap-biased tier-map)

Use --model-tier only as a one-off override. Permanent adjustments belong in settings.json or the plan lane. See the configuration reference for all models.* keys.

Advisor escalation

When a cheap or mid specialist hits a sub-question above its tier, Guild Stack can ask a powerful advisor for that specific question. The original specialist continues with the answer folded in. This is not a wholesale re-run.

The advisor sees the draft, the escalated question, and a compact critique instruction. It does not receive the raw file context.

Escalation can happen three ways:

Trigger	What fires it
Explicit signal	The specialist emits `status: "escalate"` with an `escalate_reason` in its handoff envelope.
Uncertainty marker	The output matches configured uncertainty phrases in `models.escalationMarkers`.
Short-output heuristic	`models.shortOutputThreshold[task_type][tier]` exists and the lane output falls below it.

The short-output key is empty by default. Nothing auto-writes it; calibration can propose thresholds after enough run samples, and a human lands the setting.

Advisor consults are capped per lane by models.advisorRounds. Exhausting the cap records an inconclusive advisor budget result instead of silently spending more.

Task-agent lifecycle

The target task-cell contract applies tiering to each fresh task agent:

Spawn: a new task-scoped agent starts at the resolved tier with a context bundle pointer.
Work: the agent executes the lane and escalates only the sub-question if needed.
Extract: the handoff records output, assumptions, learnings, and escalation evidence.
Dismiss: the agent ends. No idle agent carries over to the next lane.

That isolation is not fully shipped on the tmux backend. Today tmux opens one pane per specialist, so multiple tasks owned by the same specialist can share a long-lived pane and the first task’s assignment. The assignment file is written and exported but has no production worker reader, and a live completed pane is not forcibly terminated after handoff acceptance. The task-cell runtime initiative owns the move to one acknowledged assignment and one disposable agent instance per attempt.

Tier scoring itself is still per lane. It remains orthogonal to agent_mode: agent_mode chooses the dispatch substrate, while the tier chooses the cost class requested for that lane on the substrate.

Page evidence and guidance