Voice Catalog¶

A short, curated list of TTS voices that play well with Dotty's persona — warm, cheerful, easy on the ear at low volume on a tiny speaker. The full upstream catalogues are huge; this page is the opinionated subset we've actually listened to and like.

For instructions on switching, see Swap Voice. For an automated download of any Piper voice listed below, see the install helper section at the bottom.

Quick guide¶

Piper runs locally, no cloud, no jitter. Prefer it for reliability.
EdgeTTS has more variety and naturalness but needs internet.
"Best for" is opinion only — try a couple, your room and speaker matter.
Sample rate 22050 Hz is the Piper default; the firmware resamples transparently. File sizes are approximate.

Piper voices¶

All voices live on the public mirror at huggingface.co/rhasspy/piper-voices. Each voice ships as a .onnx model plus a .onnx.json config — both needed.

Key	Lang	Quality	Character	Best for	Size
`en_US-amy-medium`	en_US	medium	Warm, friendly	Kid + Adult	~63 MB
`en_US-amy-low`	en_US	low	Warm, friendly	Kid + Adult	~28 MB
`en_US-kristin-medium`	en_US	medium	Cheerful, bright	Kid Mode	~63 MB
`en_US-hfc_female-medium`	en_US	medium	Neutral, clear	Adult	~63 MB
`en_US-lessac-medium`	en_US	medium	Neutral, articulate	Adult	~63 MB
`en_US-lessac-low`	en_US	low	Neutral, articulate	Adult	~28 MB
`en_US-libritts_r-medium`	en_US	medium	Multi-speaker	Both	~75 MB
`en_GB-cori-medium`	en_GB	medium	Soft, warm UK	Kid + Adult	~63 MB
`en_GB-jenny_dylan-medium`	en_GB	medium	Playful, lively UK	Kid Mode	~63 MB
`en_GB-southern_english_female-low`	en_GB	low	Cheerful UK	Kid + Adult	~28 MB
`en_GB-alba-medium`	en_GB	medium	Scottish, cosy	Both	~63 MB
`en_GB-semaine-medium`	en_GB	medium	Neutral UK	Adult	~63 MB

The default voice that ships with make fetch-models is en_GB-cori-medium — a safe, friendly starting point.

Notes on quality tiers¶

low (16 kHz, ~28 MB) is fine for casual chat on a small speaker. The Pi can synthesize it at well over realtime even on a Pi 4.
medium (22050 Hz, ~63 MB) is the sweet spot for desk listening.
high exists for some voices (~110 MB) but the difference is hard to hear through the StackChan's tiny driver — skip it.

EdgeTTS voices¶

EdgeTTS calls Microsoft's cloud, which means latency jitter and occasional throttling, but you get a much wider voice pool. Use the slug in the voice: field under TTS.EdgeTTS (or TTS.StreamingEdgeTTS).

Slug	Lang	Character	Best for
`en-AU-NatashaNeural`	en-AU	Warm, friendly AU	Kid + Adult
`en-AU-WilliamNeural`	en-AU	Calm, neutral AU	Adult
`en-GB-SoniaNeural`	en-GB	Warm, professional UK	Both
`en-GB-MaisieNeural`	en-GB	Young, cheerful UK	Kid Mode
`en-US-AriaNeural`	en-US	Bright, expressive US	Both
`en-US-JennyNeural`	en-US	Friendly assistant US	Both

To list every available voice yourself:

pip install edge-tts
edge-tts --list-voices | grep en-

Install helper¶

To download any Piper voice from the table above into models/piper/:

make voice-list                                  # show this catalog
make voice-install VOICE=en_US-kristin-medium    # download only
make voice-install VOICE=en_US-kristin-medium APPLY=1   # download + edit .config.yaml

The same script is at scripts/voice-install.sh if you'd rather call it directly. Run ./scripts/voice-install.sh --help for flags.

After installing a Piper voice, run make doctor to verify the file is in place, then restart the server: docker compose restart xiaozhi-server.

How to switch voices¶

See Swap Voice for the full walkthrough on editing .config.yaml for either backend. The short version:

selected_module:
  TTS: LocalPiper
TTS:
  LocalPiper:
    voice: en_US-kristin-medium
    model_path: /opt/xiaozhi-esp32-server/models/piper/en_US-kristin-medium.onnx
    config_path: /opt/xiaozhi-esp32-server/models/piper/en_US-kristin-medium.onnx.json

Then docker compose restart xiaozhi-server.

Last verified: 2026-05-17.