Top Python Frameworks & Libraries for audio

PaddlePaddle/PaddleHub

Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)...

12896

2072

Python

Uberi/speech_recognition

Speech recognition module for Python, supporting several engines and APIs, online and offline.

8813

2425

Python

jiaaro/pydub

Manipulate audio with a simple and easy high level interface

9485

1101

Python

worldveil/dejavu

Audio fingerprinting and recognition in Python

6598

1457

Python

smacke/ffsubsync

Automagically synchronize subtitles with video.

7267

296

Python

librosa/librosa

Python library for audio and music analysis

7790

997

Python

openai/jukebox

Code for the paper "Jukebox: A Generative Model for Music"

8013

1449

Python

tyiannak/pyAudioAnalysis

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

6097

1219

Python

facebookresearch/AugLy

A data augmentations library for audio, image, text, and video.

5029

307

Python

speechbrain/speechbrain

A PyTorch-based Speech Toolkit

10195

1533

Python

zzw922cn/Automatic_Speech_Recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

2751

546

Python

ecthros/uncaptcha

Defeating Google's audio reCaptcha with 85% accuracy.

2735

335

Python

metabrainz/picard

Picard is a cross-platform music tagger powered by the MusicBrainz database

4181

411

Python

scottlawsonbc/audio-reactive-led-strip

:musical_note: :rainbow: Real-time LED strip music visualization using Python and the ESP8266 or Raspberry Pi...

2278

581

Python

readbeyond/aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)...

1965

198

Python

muammar/mkchromecast

Cast macOS and Linux Audio/Video to your Google Cast and Sonos Devices

1874

128

Python

josh-richardson/cadmus

A GUI frontend for @werman's Pulse Audio real-time noise suppression plugin

1835

Python

astorfi/lip-reading-deeplearning

:unlock: Lip Reading - Cross Audio-Visual Recognition using 3D Architectures

1664

310

Python

pytorch/audio

Data manipulation and transformation for audio signal processing, powered by PyTorch

1608

398

Python

pyannote/pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding...

1471

313

Python

google/spatial-media

Specifications and tools for 360º video and spatial audio.

1387

360

Python

asteroid-team/asteroid

The PyTorch-based audio source separation toolkit for researchers

1327

312

Python

chrisdonahue/wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks

1092

250

Python

despoisj/DeepAudioClassification

Finding the genre of a song with Deep Learning

1007

212

Python

quodlibet/mutagen

Python module for handling audio metadata

1742

175

Python

antiboredom/audiogrep

Creates audio supercuts.

924

Python

geigi/cozy

🎧 Listen to audio books 📚 on Linux

921

Python

mravanelli/SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.

908

226

Python

LCAV/pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in ind...

884

350

Python

CPJKU/madmom

Python audio and music signal processing library

872

151

Python

iver56/audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning....

900

115

Python

yt-dlp/yt-dlp

A feature-rich command-line audio/video downloader

120636

9577

Python

huggingface/diffusers

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX....

30005

6155

Python

HumanSignal/labelImg

LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check...

24086

6510

Python

facebookresearch/audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with M...

21299

2207

Python

Anjok07/ultimatevocalremovergui

GUI for a Vocal Remover that uses Deep Neural Networks.

21369

1578

Python

chidiwilliams/buzz

Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper....

14937

1105

Python

OpenTalker/SadTalker

[CVPR 2023] SadTalker：Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation...

13044

2452

Python

AIGC-Audio/AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

10184

864

Python

fudan-generative-vision/hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

8539

1115

Python

jianchang512/clone-voice

A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具，使用你的音色或任意声音来录制音频...

8666

924

Python

OpenTalker/video-retalking

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild...

7114

1047

Python

Zejun-Yang/AniPortrait

AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

4989

617

Python

open-mmlab/Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineer...

9267

733

Python

modelscope/ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni,...

8993

784

Python

huggingface/distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate....

3763

312

Python

facebookresearch/encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio....

3588

313

Python

spotify/basic-pitch

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

3719

304

Python

riffusion/riffusion-hobby

Stable diffusion for real-time music generation

3447

395

Python

enhuiz/vall-e

An unofficial PyTorch implementation of the audio LM VALL-E

2950

418

Python

gpt-omni/mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversation...

3113

278

Python

fudan-generative-vision/hallo2

Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

4499

647

Python

NexaAI/nexa-sdk

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Languag...

4617

642

Python

DrewThomasson/ebook2audiobook

Generate audiobooks from e-books, voice cloning & 1107+ languages!

10956

816

Python

myshell-ai/OpenVoice

Instant voice cloning by MIT and MyShell. Audio foundation model.

33717

3601

Python

kyutai-labs/moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec....

8704

761

Python

huggingface/transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and...

147713

29860

Python

boson-ai/higgs-audio

Text-audio foundation model from Boson AI

6053

391

Python