AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
We provide our implementation and pretrained models as open source in this repository.
Please refer to run.md
Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.
Currently not every model has repository.
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Speech | FastSpeech, SyntaSpeech, VITS | Yes (WIP) |
Style Transfer | GenerSpeech | Yes |
Speech Recognition | whisper, Conformer | Yes |
Speech Enhancement | ConvTasNet | Yes (WIP) |
Speech Separation | TF-GridNet | Yes (WIP) |
Speech Translation | Multi-decoder | WIP |
Mono-to-Binaural | NeuralWarp | Yes |
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Sing | DiffSinger, VISinger | Yes (WIP) |
Task | Supported Foundation Models | Status |
---|---|---|
Text-to-Audio | Make-An-Audio | Yes |
Audio Inpainting | Make-An-Audio | Yes |
Image-to-Audio | Make-An-Audio | Yes |
Sound Detection | Audio-transformer | Yes |
Target Sound Detection | TSDNet | Yes |
Sound Extraction | LASSNet | Yes |
Task | Supported Foundation Models | Status |
---|---|---|
Talking Head Synthesis | GeneFace | Yes (WIP) |
We appreciate the open source of the following projects:
ESPNet
NATSpeech
Visual ChatGPT
Hugging Face
LangChain
Stable Diffusion