States, Toggles & LED Contract¶
This document is the source of truth for Dotty's high-level modes. The model has two axes:
- STATE — what Dotty is doing right now. Mutually exclusive — exactly one State is active. Six values:
idle,talk,story_time,security,sleep,dance. - TOGGLES — orthogonal modifiers that can be on regardless of state. Two values today:
kid_mode,smart_mode. Toggles compose freely with state.
The firmware StateManager modifier (firmware/main/stackchan/modes/state_manager.{h,cpp}) owns both axes. It paints the state arc (left ring 0-5) + toggle pips at 5 Hz, drives the idle-motion profile, and emits state_changed perception events on every transition. The bridge-side perception bus consumes those events and runs six per-device consumer tasks against them (face greeter, sound turner, face-lost aborter, wake-word turner, face-identified refresher, purr player) — see architecture.md.
Submodule pin caveat. Phase 4 shipped to the active firmware fork (
BrettKinny/StackChan @ dotty, commitd78118b) on 2026-04-27. Thefirmware/firmware/submodule pin in this repo deliberately lags upstream — it's a release pointer, not the active development tree. A user who flashes from the submodule will get a pre-Phase-4 firmware. Bump the submodule (or build from the active fork) to get the StateManager. Visual / interactive bench checks tracked in issue #38.
Pair this with hardware.md (the physical LED ring + servos) and interaction-map.md (the underlying signals).
TL;DR¶
| Axis | Cardinality | Examples | Owner |
|---|---|---|---|
| State | mutex (1 of 6) | idle, talk, story_time |
firmware StateManager |
| Toggle | compose freely | kid_mode, smart_mode |
firmware StateManager + bridge state files |
| Chat sub-state | nested under talk / story_time |
listening (LED) / thinking + speaking (face only) | xiaozhi-server |
The firmware boots into idle with both toggles off. The bridge resyncs toggles from disk on the first turn after each reconnect. State transitions land via voice phrases, camera edges (face_detected → talk), head-pet hold (sleep → idle), or /xiaozhi/admin/set-state from the dashboard.
Speech sub-states are conveyed by face animations (eye gestures, talking mouth) and the dedicated listening pixel at right-ring index 11. thinking and speaking have no LED — they live on the face. listening lights pixel 11 red so the user knows when their voice is being captured as a turn.
Smart-mode flip is in-process when DOTTY_VOICE_PROVIDER=tier1slim (the recommended default): the bridge calls /xiaozhi/admin/set-tier1slim-model to mutate the live Tier1Slim provider's model/url/api_key with no docker restart. Legacy =zeroclaw still rewrites ZeroClaw's config.toml and restarts the daemon.
States (mutually exclusive)¶
| State | LED arc (left ring 0-5) | Idle profile | Behaviour | Backing path |
|---|---|---|---|---|
idle |
off (0,0,0) |
NORMAL | Ambient awareness, gentle idle motion. Default. | n/a (no chat in flight) |
talk |
dim green (0,60,0) |
NORMAL (face_tracking overlay active) | Conversation engaged. Listening pixel (right 11) lights red while the user has the turn; thinking and speaking are face-animation only. |
xiaozhi → Tier1Slim → llama-swap (default), or xiaozhi → ZeroClawLLM → bridge → ACP (legacy) |
story_time |
warm (100,40,0) |
NORMAL | Long-running interactive story. Bridge bypasses ZeroClaw, calls OpenRouter directly with story persona + rolling context. | bridge → direct OpenRouter (Phase 7 pending) |
security |
white (80,80,80) flashing 1 Hz across all 6 left pixels (kSecurityFlashHalfMs = 500) |
SURVEILLANCE | Wide deliberate scan, serious face, periodic photo + audio capture. No proactive greet. | bridge ambient task (Phase 6 partial) |
sleep |
very dim blue (0,0,16) |
SLEEPY | Head face-down + centred, servo torque off (with kSleepTorqueReleaseTimeoutMs = 3000 fallback), sleeping emoji on screen, ambient awareness paused. Wakes on face / voice / head-pet. |
firmware-only quiescence (Phase 5) |
dance |
rainbow sweep (left ring) | NORMAL | Transient performance — choreography + audio. Pre-existing dance handler. | receiveAudioHandle.py::_handle_dance |
The idle → talk trigger is the firmware face_detected event (any face, family or stranger) or onVoiceListening (the WS opens for a wake-word / inject-text / head-pet hold). The bridge runs VLM recognition (bridge.py::_capture_room_view) in parallel and feeds the resulting identity into the speaker resolver / persona — recognition does not gate the state transition.
Mutex rules¶
- Exactly one state is current.
setState(S)to the same state is a no-op. - State transitions are explicit — no implicit "fallback" to idle from other states; each non-idle state has its own exit triggers.
- Camera edges only auto-transition between
idle↔talk. Sticky states (story_time,security,sleep,dance) ignoreface_detected/face_lost.
Wake-from-sleep edges¶
StateManager accepts three sleep-exit triggers:
- Face detected (
onFaceDetected) — wakes if_state == State::Sleep. - Voice listening (
onVoiceListening) — wakes on wake-word, inject-text, or any other path that opens the WS. - Head pet (
onHeadPet) — the dark-room friendly path; capacitive head touch wakes without line of sight or speech. See voice-mode-entry.md.
Toggles (compose freely)¶
| Toggle | Toggle pip (right ring) | What it does | Persistence |
|---|---|---|---|
kid_mode |
salmon pink (220, 80, 80) at index 8 (G == B so PY32 RGB565 quantization stays warm) |
Guardrails only — content sandwich, camera tools denied, kid-safe persona. Does not pick the model. Bridge-side hot-reload via _apply_kid_mode() (no daemon restart). |
/root/zeroclaw-bridge/state/kid-mode |
smart_mode |
orange (168, 80, 0) at index 9 |
Voice-LLM model selector. ON → SMART_MODEL (claude-sonnet-4-6 by default) via OpenRouter; OFF → local default. Flip is in-process when DOTTY_VOICE_PROVIDER=tier1slim; daemon-restart when =zeroclaw. |
/root/zeroclaw-bridge/state/smart-mode |
The two toggles are orthogonal — they compose freely. kid_mode = on AND smart_mode = on runs the smart model behind the kid-safe sandwich. Both toggles are sticky across turns, daemon restarts, and reboots.
smart_mode is dashboard- and admin-endpoint-only — there is no voice trigger. Kids reach Dotty by voice but not the web dashboard, so dashboard-only is the access-control gate that keeps the more capable (and more expensive) model under household-head control.
LED contract (12-pixel ring)¶
LEFT RING (global 0–5) RIGHT RING (global 6–11)
┌───────────────────┐ ┌────────────────────────────┐
│ 0 state arc │ │ 6 face state (TOP) │
│ 1 state arc │ │ 7 reserved (locked off) │
│ 2 state arc │ │ 8 kid_mode toggle │
│ 3 state arc │ │ 9 smart_mode toggle │
│ 4 state arc │ │ 10 reserved (locked off) │
│ 5 state arc │ │ 11 listening (BOTTOM) │
└───────────────────┘ └────────────────────────────┘
| Index | Half | Owner | Behaviour |
|---|---|---|---|
| 0–5 | left | StateManager (state arc) | All six paint the current mutex-state colour. Dance suppresses and lets the rainbow animation own the ring. |
| 6 | right | StateManager (face state pip) | Yellow (168, 140, 0) when a face is detected; green (0, 140, 30) when the bridge has identified the face via room-view VLM + roster match (mutex on the same pixel). Identified state has a kFaceIdentifiedTimeoutMs = 4000 firmware-side timeout, with kFaceIdentifiedFlickerGraceMs = 1500 to ride out brief detection hiccups — bridge refreshes by calling /xiaozhi/admin/set-face-identified on each successful match. |
| 7, 10 | right | StateManager (locked off) | Reserved for future indicators (low-battery is a known candidate). Re-asserted to (0,0,0) every kReassertIntervalMs = 200 ms as defense-in-depth. |
| 8 | right | StateManager (kid_mode pip) |
Salmon pink (220, 80, 80) when kid_mode = on; off otherwise. G == B keeps the warm hue surviving RGB565 quantization (the prior (168, 80, 100) hue had B > G and read cool/magenta). |
| 9 | right | StateManager (smart_mode pip) |
Orange (168, 80, 0) when smart_mode = on; off otherwise. |
| 11 | right | StateManager (listening pip) | Lit while xiaozhi is in LISTENING (mic open, ASR active, user's turn); off otherwise. Driven by StateManager::setListening(bool). Bottom of the right ring; spatially separated from the toggle pips. |
LED quirks¶
- 5 Hz tick. StateManager re-paints the state arc AND the entire right ring (face / kid / smart / listening / reserved 7 / reserved 10) every
kReassertIntervalMs = 200ms. The tick drives the SECURITY 1 Hz flash (kSecurityFlashHalfMs = 500) and the face-identified 4 s timeout, and acts as defense-in-depth re-assert across all status indicators — MCP writes / dance keyframes / future writers cannot persistently clobber any pixel (worst case: 200 ms flicker). - PY32 IO expander quantises to RGB565. Brightness deltas crush —
(40,40,40)reads almost identical to(200,200,200). Use distinct hues, not brightness levels, for any indicator that needs to read across a room. - MCP tools are contract-aware.
self.robot.set_led_colorandself.robot.set_led_multiare restricted to the LEFT ring only (indices 0-5). Attempts to write right-ring indices via these tools are rejected with a warn log. Use/xiaozhi/admin/set-face-identifiedto light the face pixel green for ~4 seconds. - Dance choreography only animates the left ring. Custom JSON dances that set
rightRgbColorwill see that field preserved on theKeyframestruct but not applied to hardware. - RightNeonLight uses local indices 0–5 internally, mapped to global 6–11 via
+6. StateManager constants:kFacePipRightLocal=0,kReservedPipRightLocal_7=1,kKidModePipRightLocal=2,kSmartModePipRightLocal=3,kReservedPipRightLocal_10=4,kListeningPipRightLocal=5. - Dashboard mirror. The bridge dashboard at
/ui/led-ring-mirrorshows all four indicators in the same colours as the physical ring, updated via 2 s HTMX polling +dotty-refreshevent nudges fired by SSE perception events (face_detected,face_lost,face_recognized,chat_status).
State transitions¶
stateDiagram-v2
[*] --> idle
idle --> talk: face_detected (firmware) / onVoiceListening
talk --> idle: face_lost grace expired (firmware) / onVoiceStandby
idle --> sleep: voice "go to sleep" / "goodnight Dotty"
sleep --> idle: face_detected / voice / head_pet
idle --> security: voice "keep watch" / "security mode"
security --> idle: voice "wake up" / face_detected (Phase 6)
idle --> story_time: voice "tell me a story"
story_time --> idle: voice "the end" / "stop story" / 90 s silence (Phase 7)
idle --> dance: voice "dance" / song name
dance --> idle: choreography ends
talk --> sleep: voice "goodnight Dotty"
talk --> story_time: voice "tell me a story"
talk --> dance: voice "dance"
Voice triggers¶
| Phrase (substring, case-insensitive) | Target state |
|---|---|
goodnight dotty / good night dotty / go to sleep |
sleep |
keep watch / security mode / watch the room |
security |
tell me a story / story time |
story_time |
wake up / come back / are you there (only when state ∈ {sleep, security, story_time}) |
idle |
Both kid_mode and smart_mode are voice-untoggleable — they are guardian-controlled axes driven from the bridge's /admin/kid-mode and /admin/smart-mode endpoints (or the dashboard cards that wrap them).
Admin endpoints¶
| Endpoint | Body | Effect | Where |
|---|---|---|---|
POST /admin/kid-mode |
{"enabled": bool} |
Persists + hot-reloads kid-mode globals atomically via _apply_kid_mode(). No daemon restart. Also pushes the kid pip via xiaozhi /xiaozhi/admin/set-toggle. |
bridge (localhost-only) |
POST /admin/smart-mode |
{"enabled": bool, "device_id": "<optional>"} |
Persists + flips voice provider's model. When DOTTY_VOICE_PROVIDER=tier1slim: in-process hot-swap via /xiaozhi/admin/set-tier1slim-model. When =zeroclaw: rewrites config.toml + restarts daemon. Also pushes the smart pip. |
bridge (localhost-only) |
POST /xiaozhi/admin/set-state |
{"state": "<idle\|talk\|story_time\|security\|sleep\|dance>", "device_id": "<optional>"} |
Dispatches MCP self.robot.set_state onto the device WS; firmware StateManager applies it. |
xiaozhi-server |
POST /xiaozhi/admin/set-toggle |
{"name": "kid_mode\|smart_mode", "enabled": bool, "device_id": "<optional>"} |
Dispatches MCP self.robot.set_toggle; firmware StateManager updates the pip without disturbing the active state. |
xiaozhi-server |
POST /xiaozhi/admin/set-face-identified |
{"device_id": "<optional>"} |
Lights the face-identified pixel green; refresh required every < kFaceIdentifiedTimeoutMs (4 s) to hold. |
xiaozhi-server |
MCP tools (firmware)¶
| Tool | Arguments | Caller |
|---|---|---|
self.robot.set_state |
{"state": "<...>"} |
xiaozhi-server /xiaozhi/admin/set-state relay |
self.robot.set_toggle |
{"name": "kid_mode\|smart_mode", "enabled": bool} |
xiaozhi-server /xiaozhi/admin/set-toggle relay; receiveAudioHandle.py voice phrases |
self.robot.set_face_identified |
{} |
xiaozhi-server /xiaozhi/admin/set-face-identified relay |
Backing architecture per state¶
| State | Voice path | Memory? | Tools? |
|---|---|---|---|
idle |
n/a | n/a | n/a |
talk |
xiaozhi → Tier1Slim → llama-swap (default), or xiaozhi → ZeroClawLLM → bridge → ZeroClaw ACP (legacy). Smart-mode swaps the inner-loop model. | yes (FTS via memory_lookup tool / full ZeroClaw memory) |
yes (4-tool Tier1 catalogue / full ZeroClaw MCP) |
story_time |
xiaozhi → bridge → direct OpenRouter (story persona overlay + rolling context) | per-session list (Phase 7) | no |
security |
bridge ambient task (no voice path active) | logs to journal | photo + audio capture |
sleep |
mic stays on for "wake up"; no LLM round-trip | n/a | n/a |
dance |
bridge handler dispatches choreography + audio file | n/a | dance MCP |
smart_mode flips the inner-loop model and is sticky across turns. With DOTTY_VOICE_PROVIDER=tier1slim (the recommended default) the flip is instantaneous — Tier1Slim's set_runtime() mutates the live provider; no docker restart and no daemon restart. story_time is the only voice path that bypasses both ZeroClaw and Tier1Slim, with its own session memory (Phase 7).
Implementation status¶
| Phase | Scope | Status |
|---|---|---|
| 4 | StateManager foundation: state pip + toggle pips + state_changed event + voice phrases + admin endpoints + LED contract |
✅ shipped 2026-04-27 (firmware d78118b, bridge+xiaozhi 10cbc63). Bench checks pending: #38. |
| 5 | Sleep state behaviour (servo park + torque off + sleepy emoji + wake triggers) | ✅ shipped; bench checks: #39. |
| 6 | Security state behaviour (periodic photo + audio capture, greeter gate) | ✅ shipped; bench checks: #40. |
| 7 | Story_time state (interactive setup, OpenRouter session, choose-your-own-adventure) | pending: #26. |
| 8 | Ambient awareness loop (idle-state photo + audio scene capture, journal) | partial — bridge runs _perception_* consumers; firmware state binding pending. Tracked alongside #26. |
Phase 4 established the rails — pip, transition events, dispatch helpers, voice routing. Phases 5–6 hang behaviour off those rails and have already shipped to the active firmware fork. Phases 7–8 are the next firmware deliverables.
Sources of truth¶
- Firmware (active fork
BrettKinny/StackChan @ dotty):firmware/main/stackchan/modes/state_manager.{h,cpp},firmware/main/stackchan/modifiers/face_tracking.cpp(camera-edge hooks),firmware/main/hal/hal_mcp.cpp(set_state / set_toggle MCP). This repo's submodule pin lags — bump it (or maintain a parallel checkout per thefirmware/README) to flash a build that includes Phase 4+. - Bridge:
bridge.py(_dispatch_set_state,_dispatch_set_toggle,_admin_state,_admin_smart_mode,_admin_kid_mode,_apply_model_swap,_apply_tier1slim_runtime,_update_perception_stateforstate_changed, all_perception_*consumers),receiveAudioHandle.py(voice state phrases + per-conn toggle sync) - xiaozhi-server patches:
custom-providers/xiaozhi-patches/http_server.py(/xiaozhi/admin/set-state,/xiaozhi/admin/set-toggle,/xiaozhi/admin/set-tier1slim-model,/xiaozhi/admin/set-face-identified,/xiaozhi/admin/inject-text,/xiaozhi/admin/abort,/xiaozhi/admin/set-head-angles),custom-providers/xiaozhi-patches/textMessageHandlerRegistry.py(state_changed→conn.current_state, perception relay to bridge) - Dashboard:
bridge/dashboard.py+bridge/templates/state_card.html+bridge/templates/smart_mode.html+bridge/templates/led_ring_mirror.html
Last verified: 2026-05-17.