Top Python Frameworks & Libraries for audio

Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)...

12833
2067
Python

Speech recognition module for Python, supporting several engines and APIs, online and offline.

8670
2415
Python

Manipulate audio with a simple and easy high level interface

9269
1075
Python

Audio fingerprinting and recognition in Python

6519
1449
Python

Automagically synchronize subtitles with video.

7078
293
Python

Python library for audio and music analysis

7511
983
Python

Code for the paper "Jukebox: A Generative Model for Music"

7958
1443
Python

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

6012
1214
Python

A data augmentations library for audio, image, text, and video.

4996
304
Python

A PyTorch-based Speech Toolkit

9599
1459
Python

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

2751
546
Python

Defeating Google's audio reCaptcha with 85% accuracy.

2735
335
Python

Picard is a cross-platform music tagger powered by the MusicBrainz database

3994
400
Python

:musical_note: :rainbow: Real-time LED strip music visualization using Python and the ESP8266 or Raspberry Pi...

2278
581
Python

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)...

1965
198
Python

Cast macOS and Linux Audio/Video to your Google Cast and Sonos Devices

1874
128
Python

A GUI frontend for @werman's Pulse Audio real-time noise suppression plugin

1835
47
Python

:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

1664
310
Python

Data manipulation and transformation for audio signal processing, powered by PyTorch

1608
398
Python

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding...

1471
313
Python

Specifications and tools for 360º video and spatial audio.

1387
360
Python

The PyTorch-based audio source separation toolkit for researchers

1327
312
Python

WaveGAN: Learn to synthesize raw audio with generative adversarial networks

1092
250
Python

Finding the genre of a song with Deep Learning

1007
212
Python

Python module for handling audio metadata

1667
165
Python

Creates audio supercuts.

924
67
Python

🎧 Listen to audio books 📚 on Linux

921
56
Python

SincNet is a neural architecture for efficiently processing raw audio samples.

908
226
Python

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in ind...

884
350
Python

Python audio and music signal processing library

872
151
Python

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning....

900
115
Python

A feature-rich command-line audio/video downloader

106089
8328
Python

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX....

28328
5801
Python

LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check...

23464
6408
Python

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with M...

21299
2207
Python

GUI for a Vocal Remover that uses Deep Neural Networks.

20030
1471
Python

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper....

14044
1036
Python

[CVPR 2023] SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation...

12529
2334
Python

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

10116
862
Python

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

8341
1095
Python

A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具,使用你的音色或任意声音来录制音频...

8364
876
Python

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild...

6958
1024
Python

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

4907
609
Python

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineer...

8869
694
Python

Use PEFT or Full-parameter to finetune 500+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Q...

6643
568
Python

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate....

3763
312
Python

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio....

3588
313
Python

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

3719
304
Python

Stable diffusion for real-time music generation

3447
395
Python

An unofficial PyTorch implementation of the audio LM VALL-E

2950
418
Python

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversation...

3113
278
Python

Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

4499
647
Python

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Languag...

4475
623
Python

Convert ebooks to audiobooks with chapters and metadata using dynamic AI models and voice cloning. Supports 1,107+ languages!...

9359
670
Python

Instant voice cloning by MIT and MyShell. Audio foundation model.

31545
3195
Python

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec....

7953
652
Python