Skip to content

Docs — StackChan tech reference

Curated reference for the StackChan voice robot stack. The top-level README.md covers how to deploy it; these docs cover what it is underneath — hardware, protocols, upstream model facts, and capabilities we aren't yet using.

Every file here cites upstream sources so a future agent (or human) can re-verify claims against the canonical specs rather than trusting our paraphrase.

Start here if you want…

If you're trying to… Read
Understand the overall shape architecture.md
Know what the physical robot can do hardware.md
Understand the voice pipeline (ASR/TTS/VAD) voice-pipeline.md
Understand the default voice LLM (Tier1Slim + escalation) tier1slim.md
Understand the brain (model matrix + ZeroClaw) brain.md
Run different models on voice vs. Discord multi-daemon-split.md
Know what's on the wire between components protocols.md
See every cross-layer signal at a glance interaction-map.md
Know what mode the robot is in (and what the LEDs mean) modes.md
Find features we aren't using yet latent-capabilities.md
Pick an LLM backend llm-backends.md
Jump to an upstream repo or spec references.md

File map

docs/
├── README.md                ← you are here (index)
├── architecture.md          ← high-level data flow, actor responsibilities
├── hardware.md              ← M5Stack StackChan body + firmware lineage + MCP tool catalog
├── voice-pipeline.md        ← xiaozhi-esp32-server, FunASR/Whisper, VAD, Piper/EdgeTTS
├── tier1slim.md             ← two-tier voice LLM provider + escalation contract
├── brain.md                 ← model matrix (Tier1Slim + ZeroClaw), bridge, OpenRouter
├── multi-daemon-split.md    ← split voice + Discord across two ZeroClaw daemons
├── protocols.md             ← Xiaozhi WebSocket, MCP-over-WS, ACP JSON-RPC, emotion
├── interaction-map.md       ← every cross-layer signal: source, dest, protocol, notes
├── modes.md                 ← behavioural mode taxonomy + LED contract + transitions
├── latent-capabilities.md   ← upstream features we could wire up (cross-refs ROADMAP.md)
├── llm-backends.md          ← side-by-side comparison of LLM backend options
└── references.md            ← canonical URLs, licenses, model cards, spec docs

Conventions these docs follow

  • TL;DR at the top of each file — 3-6 bullets, scannable in the first 40 lines.
  • Tables over prose for dense facts — specs, tunables, method signatures.
  • Grep-bait headers — e.g. ## MCP tool handshake, ## session/prompt — so you can navigate by header search.
  • Relative links only[voice-pipeline.md](./voice-pipeline.md); never absolute paths.
  • Freshness footer — every non-index file ends with Last verified: YYYY-MM-DD.
  • Placeholders for per-deployment values<XIAOZHI_HOST>, <ZEROCLAW_HOST>, etc. (mapping lives with the deployer, not in this repo).
  • Soft claims where unverified — if a fact came from a secondary source or we couldn't verify, the text says so rather than pretending to cite upstream.

Relationship to the rest of the repo

  • ../README.md — deployment & ops (commands, layout, troubleshooting).
  • ../CLAUDE.md — agent orientation for this repo specifically.
  • ../bridge.py, ../zeroclaw.py, ../edge_stream.py, ../fun_local.py, ../piper_local.py — canonical source for the custom provider patches.
  • These docs/ — the why and the what else is possible behind the above.

When docs here are stale

Each sub-file has a Last verified: date. Freshness decays roughly as follows:

Topic Half-life Why
Hardware spec Years M5Stack CoreS3 revisions are slow
Protocol spec Months xiaozhi is actively evolving
Model facts (Qwen3) Weeks-months OpenRouter pricing and model revisions churn
Latent capabilities Months Upstream adds features regularly

If you're reading this a year from now, treat the protocol + model claims as starting points for re-verification, not ground truth.

Last verified: 2026-05-17.