Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.
Self-host the powerful Chatterbox TTS model with this enhanced FastAPI server! Features an intuitive Web UI, a flexible API endpoint, voice cloning, large text processing via intelligent chunking, audiobook generation, and consistent, reproducible voices using built-in ready-to-use voices and a generation seed feature.
🚀 Try it now! Test the full TTS server with voice cloning and audiobook generation in Google Colab - no installation required!
This server is based on the architecture and UI of our Dia-TTS-Server project but uses the distinct chatterbox-tts
engine. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and Apple Silicon (MPS) GPUs, with a fallback to CPU.
The Chatterbox TTS model by Resemble AI provides capabilities for generating high-quality speech. This project builds upon that foundation by providing a robust FastAPI server that makes Chatterbox significantly easier to use and integrate.
🚀 Want to try it instantly? Launch the live demo in Google Colab - no installation needed!
The server expects plain text input for synthesis and we solve the complexity of setting up and running the model by offering:
This server is your gateway to leveraging Chatterbox’s TTS capabilities seamlessly, with enhanced stability, voice consistency, and large text support for plain text inputs.
🔥 Live Demo Available:
This server application enhances the underlying chatterbox-tts
engine with the following:
🚀 Core Functionality:
./voices
directory..wav
or .mp3
).seed
parameter to UI and API for influencing generation results. Using a fixed integer seed in combination with Predefined Voices or Voice Cloning helps maintain consistency./tts
):
split_text
, chunk_size
), generation settings (temperature, exaggeration, CFG weight, seed, speed factor, language), and output format.config.yaml
settings (server, model, paths) and save generation defaults.config.yaml
for all runtime configuration, managed via config.py
(YamlConfigManager
). If config.yaml
is missing, it’s created with default values from config.py
.parselmouth
is installed) unvoiced segment removal to improve audio quality. These are configurable.config.yaml
(ui_state
section).🔧 General Enhancements:
ChatterboxTTS.from_pretrained()
for robust model loading from Hugging Face Hub, utilizing the standard HF cache.requirements.txt
.utils.py
for audio processing, text handling, and file management./tts
) as the primary method for programmatic generation, exposing all key parameters./docs
)./api/ui/initial-data
also serves as a comprehensive status check).split_text
and chunk_size
../voices
directory.config.yaml
).ui/presets.yaml
..wav
/.mp3
files.config.yaml
) and default generation parameters directly in the UI.config.yaml
.ChatterboxTTS.from_pretrained()
.config.yaml
.download_model.py
script available to pre-download specific model components to a local directory (this is separate from the main HF cache used at runtime).config.yaml
.docker compose up -d
).libsndfile1
: Audio library needed by soundfile
. Install via package manager (e.g., sudo apt install libsndfile1
).ffmpeg
: For robust audio operations (optional but recommended). Install via package manager (e.g., sudo apt install ffmpeg
).This project uses specific dependency files to ensure a smooth, one-command installation for your hardware. Follow the path that matches your system.
1. Clone the Repository
git clone https://github.com/devnen/Chatterbox-TTS-Server.git
cd Chatterbox-TTS-Server
2. Create a Python Virtual Environment
Using a virtual environment is crucial to avoid conflicts with other projects.
Windows (PowerShell):
python -m venv venv
.\venv\Scripts\activate
Linux (Bash):
python3 -m venv venv
source venv/bin/activate
Your command prompt should now start with (venv)
.
3. Choose Your Installation Path
Pick one of the following commands based on your hardware. This single command will install all necessary dependencies with compatible versions.
This is the most straightforward option and works on any machine without a compatible GPU.
# Make sure your (venv) is active
pip install --upgrade pip
pip install -r requirements.txt
For users with NVIDIA GPUs. This provides the best performance.
Prerequisite: Ensure you have the latest NVIDIA drivers installed.
# Make sure your (venv) is active
pip install --upgrade pip
pip install -r requirements-nvidia.txt
After installation, verify that PyTorch can see your GPU:
python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}'); print(f'Device name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}')"
If CUDA available:
shows True
, your setup is correct!
For users with modern, ROCm-compatible AMD GPUs.
Prerequisite: Ensure you have the latest ROCm drivers installed on a Linux system.
# Make sure your (venv) is active
pip install --upgrade pip
pip install -r requirements-rocm.txt
After installation, verify that PyTorch can see your GPU:
python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'ROCm available: {torch.cuda.is_available()}'); print(f'Device name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else None}')"
If ROCm available:
shows True
, your setup is correct!
For users with Apple Silicon Macs (M1, M2, M3, etc.).
Prerequisite: Ensure you have macOS 12.3 or later for MPS support.
Step 1: Install PyTorch with MPS support first
# Make sure your (venv) is active
pip install --upgrade pip
pip install torch torchvision torchaudio
Step 2: Configure the server to use MPS
Update your config.yaml
to use MPS instead of CUDA:
# The server will create config.yaml on first run, or you can create it manually
# Make sure the tts_engine device is set to 'mps'
Step 3: Install remaining dependencies
# Install chatterbox-tts without its dependencies to avoid conflicts
pip install --no-deps git+https://github.com/resemble-ai/chatterbox.git
# Install core server dependencies
pip install fastapi 'uvicorn[standard]' librosa safetensors soundfile pydub audiotsm praat-parselmouth python-multipart requests aiofiles PyYAML watchdog unidecode inflect tqdm
# Install missing chatterbox dependencies
pip install conformer==0.3.2 diffusers==0.29.0 resemble-perth==1.0.1 transformers==4.46.3
# Install s3tokenizer without its problematic dependencies
pip install --no-deps s3tokenizer
# Install a compatible version of ONNX
pip install onnx==1.16.0
Step 4: Configure MPS device
Either edit config.yaml
manually or let the server create it, then modify:
tts_engine:
device: mps # Changed from 'cuda' to 'mps'
After installation, verify that PyTorch can see your GPU:
python -c "import torch; print(f'PyTorch version: {torch.__version__}'); print(f'MPS available: {torch.backends.mps.is_available()}'); print(f'Device will use: {\"mps\" if torch.backends.mps.is_available() else \"cpu\"}')"
If MPS available:
shows True
, your setup is correct!
Want to test Chatterbox TTS Server immediately without any installation?
Prefer local installation? Continue reading below for full setup instructions.
The server relies exclusively on config.yaml
for runtime configuration.
config.yaml
: Located in the project root. This file stores all server settings, model paths, generation defaults, and UI state. It is created automatically on the first run (using defaults from config.py
) if it doesn’t exist. This is the main file to edit for persistent configuration changes.config.yaml
.Key Configuration Areas (in config.yaml
or UI):
server
: host
, port
, logging settings.model
: repo_id
(e.g., “ResembleAI/chatterbox”).tts_engine
: device
(‘auto’, ‘cuda’, ‘mps’, ‘cpu’), predefined_voices_path
, reference_audio_path
, default_voice_id
.paths
: model_cache
(for download_model.py
), output
.generation_defaults
: Default UI values for temperature
, exaggeration
, cfg_weight
, seed
, speed_factor
, language
.audio_output
: format
, sample_rate
, max_reference_duration_sec
.ui_state
: Stores the last used text, voice mode, file selections, etc., for UI persistence.ui
: title
, show_language_select
, max_predefined_voices_in_dropdown
.debug
: save_intermediate_audio
.⭐ Remember: Changes made to server
, model
, tts_engine
, or paths
sections in config.yaml
(or via the UI’s Server Configuration section) require a server restart to take effect. Changes to generation_defaults
or ui_state
are applied dynamically or on the next page load.
Important Note on Model Downloads (First Run):
The very first time you start the server, it needs to download the chatterbox-tts
model files from Hugging Face Hub. This is an automatic, one-time process (per model version, or until your Hugging Face cache is cleared).
You can optionally use the python download_model.py
script to pre-download specific model components to the ./model_cache
directory defined in config.yaml
. However, please note that the runtime engine (engine.py
) primarily loads the model from the main Hugging Face Hub cache directly, not this specific local model_cache
directory.
Steps to Run:
source venv/bin/activate
.\venv\Scripts\activate
python server.py
http://localhost:PORT
(e.g., http://localhost:8004
if your configured port is 8004).http://localhost:PORT/docs
for interactive API documentation.CTRL+C
in the terminal where the server is running.Follow these steps to update your local installation to the latest version from GitHub. This guide provides two methods: the recommended git stash
workflow and a manual backup alternative. Both will preserve your local config.yaml
.
First, Navigate to Your Project Directory & Activate Venv
Before starting, open your terminal, go to the project folder, and activate your virtual environment.
cd Chatterbox-TTS-Server
# On Windows (PowerShell):
.\venv\Scripts\activate
# On Linux (Bash):
source venv/bin/activate
This is the standard and safest way to update using Git. It automatically handles your local changes (like to config.yaml
) without needing to manually copy files.
Step 1: Stash Your Local Changes
This command safely stores your modifications on a temporary “shelf.”
git stash
Step 2: Pull the Latest Version
Now that your local changes are safely stored, you can download the latest code from GitHub.
git pull origin main
Step 3: Re-apply Your Changes
This command takes your changes from the shelf and applies them back to the updated code.
git stash pop
Your config.yaml
will now have your settings, and the rest of the project files will be up-to-date. You can now proceed to the “Final Steps” section below.
This method involves manually backing up and restoring your configuration file.
Step 1: Backup Your Configuration
⚠️ Important: Create a backup of your config.yaml
to preserve your custom settings.
# Create a backup of your current configuration
cp config.yaml config.yaml.backup
Step 2: Update the Repository
Choose one of the following commands based on your needs:
git pull origin main
If you encounter merge conflicts with config.yaml
, you may need to resolve them manually.# Fetch latest changes and reset to match remote exactly
git fetch origin
git reset --hard origin/main
Step 3: Restore Your Configuration
# Restore your backed-up configuration
cp config.yaml.backup config.yaml
Now, proceed to the “Final Steps” section.
After you have updated the code using either method, complete these final steps.
1. Check for New Configuration Options
⭐ Recommended: Compare your restored config.yaml
with the new default config to see if there are new options you might want to adopt. The server will add new keys with default values, but you may want to review them.
2. Update Dependencies
⭐ Important: After pulling new code, always update the dependencies to ensure you have the correct versions. Choose the command that matches your hardware:
pip install -r requirements.txt
pip install -r requirements-nvidia.txt
pip install -r requirements-rocm.txt
3. Restart the Server
If the server was running, stop it (CTRL+C
) and restart it to apply all the updates.
python server.py
⭐ Note: Your custom settings in config.yaml
are preserved with this method. The server will automatically add any new configuration options with default values if needed. You can safely delete config.yaml.backup
once you’ve verified everything works correctly.
⭐ Docker Users: If using Docker and you have a local config.yaml
mounted as a volume, the same backup/restore process applies before running:
docker compose down
docker compose pull # if using pre-built images
docker compose up -d --build
http://localhost:PORT
)The most intuitive way to use the server:
Predefined Voices
: Select a curated voice from the ./voices
directory.Voice Cloning
: Select an uploaded reference file from ./reference_audio
.ui/presets.yaml
.config.yaml
.config.yaml
(requires server restart for some changes)./docs
for interactive details)The primary endpoint for TTS generation is /tts
, which offers detailed control over the synthesis process.
/tts
(POST): Main endpoint for speech generation.
CustomTTSRequest
):
text
(string, required): Plain text to synthesize.voice_mode
(string, “predefined” or “clone”, default “predefined”): Specifies voice source.predefined_voice_id
(string, optional): Filename of predefined voice (if voice_mode
is “predefined”).reference_audio_filename
(string, optional): Filename of reference audio (if voice_mode
is “clone”).output_format
(string, “wav” or “opus”, default “wav”).split_text
(boolean, default True): Whether to chunk long text.chunk_size
(integer, default 120): Target characters per chunk.temperature
, exaggeration
, cfg_weight
, seed
, speed_factor
, language
: Generation parameters overriding defaults.audio/wav
or audio/opus
)./v1/audio/speech
(POST): OpenAI-compatible.
input
: Text.voice
: ‘S1’, ‘S2’, ‘dialogue’, ‘predefined_voice_filename.wav’, or ‘reference_filename.wav’.response_format
: ‘opus’ or ‘wav’.speed
: Playback speed factor (0.5-2.0).seed
: (Optional) Integer seed, -1 for random.GET /api/ui/initial-data
: Fetches all initial configuration, file lists, and presets for the UI.POST /save_settings
: Saves partial updates to config.yaml
.POST /reset_settings
: Resets config.yaml
to defaults.GET /get_reference_files
: Lists files in reference_audio/
.GET /get_predefined_voices
: Lists formatted voices from voices/
.POST /upload_reference
: Uploads reference audio files.POST /upload_predefined_voice
: Uploads predefined voice files.Run Chatterbox TTS Server easily using Docker. The recommended method uses Docker Compose, which is pre-configured for different GPU types.
video
and render
groups.This method uses the provided docker-compose.yml
files to manage the container, volumes, and configuration easily.
git clone https://github.com/devnen/Chatterbox-TTS-Server.git
cd Chatterbox-TTS-Server
The default docker-compose.yml
is configured for NVIDIA GPUs.
docker compose up -d --build
Prerequisites: Ensure you have ROCm drivers installed on your host system and your user is in the required groups:
# Add your user to required groups (one-time setup)
sudo usermod -a -G video,render $USER
# Log out and back in for changes to take effect
Start the container:
docker compose -f docker-compose-rocm.yml up -d --build
A dedicated compose file is now provided for CPU-only users to avoid GPU driver errors.
docker compose -f docker-compose-cpu.yml up -d --build
⭐ Note: The first time you run this, Docker will build the image and download model files, which can take some time. Subsequent starts will be much faster.
Open your web browser to http://localhost:PORT
(e.g., http://localhost:8004
or the host port you configured).
# Check if container can see NVIDIA GPU
docker compose exec chatterbox-tts-server nvidia-smi
# Verify PyTorch can access the GPU
docker compose exec chatterbox-tts-server python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'GPU count: {torch.cuda.device_count()}')"
# Check if container can see AMD GPU
docker compose -f docker-compose-rocm.yml exec chatterbox-tts-server rocm-smi
# Verify PyTorch can access the GPU
docker compose -f docker-compose-rocm.yml exec chatterbox-tts-server python3 -c "import torch; print(f'ROCm available: {torch.cuda.is_available()}'); print(f'Device name: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else \"No GPU detected\"}')"
docker compose logs -f # For NVIDIA
docker compose -f docker-compose-rocm.yml logs -f # For AMD
docker compose -f docker-compose-cpu.yml logs -f # For CPU
docker compose down # For NVIDIA
docker compose -f docker-compose-rocm.yml down # For AMD
docker compose -f docker-compose-cpu.yml down # For CPU
docker compose restart chatterbox-tts-server # For NVIDIA
docker compose -f docker-compose-rocm.yml restart chatterbox-tts-server # For AMD
docker compose -f docker-compose-cpu.yml restart chatterbox-tts-server # For CPU
## AMD ROCm Support Details
### **GPU Architecture Override (Advanced Users)**
If your AMD GPU is not officially supported by ROCm but is similar to a supported architecture, you can override the detected architecture:
```bash
# For RX 5000/6000 series (gfx10xx) - override to gfx1030
HSA_OVERRIDE_GFX_VERSION=10.3.0 docker compose -f docker-compose-rocm.yml up -d
# For RX 7000 series (gfx11xx) - override to gfx1100
HSA_OVERRIDE_GFX_VERSION=11.0.0 docker compose -f docker-compose-rocm.yml up -d
# For Vega cards - override to gfx906
HSA_OVERRIDE_GFX_VERSION=9.0.6 docker compose -f docker-compose-rocm.yml up -d
Check your GPU architecture:
# Method 1: Using rocminfo (if ROCm installed on host)
rocminfo | grep "Name:"
# Method 2: Using lspci
lspci | grep VGA
Common GPU Architecture Mappings:
HSA_OVERRIDE_GFX_VERSION=11.0.0
HSA_OVERRIDE_GFX_VERSION=10.3.0
HSA_OVERRIDE_GFX_VERSION=10.3.0
HSA_OVERRIDE_GFX_VERSION=9.0.6
nvidia-smi
works on host, ensure Container Toolkit is installeddocker-compose.yml
, comment out the deploy
section, and uncomment the runtime: nvidia
line as shown in the file’s commentschunk_size
in the UI for long textssudo apt install rocm-dkms rocm-libs
groups $USER
should include video
and render
sudo usermod -a -G video,render $USER
# Log out and back in
HSA_OVERRIDE_GFX_VERSION
override as shown abovedocker-compose-rocm.yml
:privileged: true
cap_add:
- SYS_PTRACE
devices:
- /dev/mem
PORT
environment variable: PORT=8005 docker compose up -d
docker
groupconfig.yaml
for settings. The docker-compose files mount your local config.yaml
to /app/config.yaml
inside the container.config.yaml
doesn’t exist locally, the application will create a default one with sensible defaults.config.yaml
directly. Changes to server/model/path settings require a container restart:docker compose restart chatterbox-tts-server
Persistent data is stored on your host machine via volume mounts:
./config.yaml:/app/config.yaml
- Main application configuration./voices:/app/voices
- Predefined voice audio files./reference_audio:/app/reference_audio
- Your uploaded reference audio files for cloning./outputs:/app/outputs
- Generated audio files saved from UI/API./logs:/app/logs
- Server log fileshf_cache:/app/hf_cache
- Named volume for Hugging Face model cache (persists downloads)Managing volumes:
# Remove all data (including downloaded models)
docker compose down -v
# Remove only application data (keep model cache)
docker compose down
sudo rm -rf voices/ reference_audio/ outputs/ logs/ config.yaml
# View volume usage
docker system df
python -c "import torch; print(torch.backends.mps.is_available())"
pip install onnx==1.16.0
as shown in the installation steps.config.yaml
has device: mps
in the tts_engine
section.nvidia-smi
), ensure correct CUDA-enabled PyTorch is installed (Installation Step 4).chunk_size
(e.g., 100-150).chatterbox-tts
, librosa
): Ensure virtual environment is active and pip install -r requirements.txt
completed successfully.libsndfile
Error (Linux): Run sudo apt install libsndfile1
.ChatterboxTTS.from_pretrained()
will attempt to download from Hugging Face Hub. Ensure model.repo_id
in config.yaml
is correct../reference_audio
, ./voices
)../config.yaml
, ./logs
, ./outputs
, ./reference_audio
, ./voices
, and the Hugging Face cache directory if using Docker volumes.config.yaml
is writable by the server process.Address already in use
): Another process is using the port. Stop it or change server.port
in config.yaml
(requires server restart).Set the CUDA_VISIBLE_DEVICES
environment variable before running python server.py
to specify which GPU(s) PyTorch should see. The server uses the first visible one (effectively cuda:0
from PyTorch’s perspective).
Example (Use only physical GPU 1):
CUDA_VISIBLE_DEVICES="1" python server.py
set CUDA_VISIBLE_DEVICES=1 && python server.py
$env:CUDA_VISIBLE_DEVICES="1"; python server.py
Example (Use physical GPUs 6 and 7 - server uses GPU 6):
CUDA_VISIBLE_DEVICES="6,7" python server.py
set CUDA_VISIBLE_DEVICES=6,7 && python server.py
$env:CUDA_VISIBLE_DEVICES="6,7"; python server.py
Note: CUDA_VISIBLE_DEVICES
selects GPUs; it does not fix OOM errors if the chosen GPU lacks sufficient memory.
Contributions are welcome! Please feel free to open an issue to report bugs or suggest features, or submit a Pull Request for improvements.
This project is licensed under the MIT License.
You can find it here: https://opensource.org/licenses/MIT