Pyannote.audio + Replicate = 💛

This is an implementation of pyannote.audio in a cog wrapper to easily run speaker diarization via replicate to save you the trouble of dependency hell 😇.

How this model works

The model takes an input audio file in the audio parameter. Then, it runs speaker diarization and returns a list of audio files containing the individual speaker turns within the audio file split by speaker and index. The output URLs contain encoded information in the file name to make working with the outputs easier. The format for the file name is {index}_{speaker}_{duration} which resolves to 0_SPEAKER_01_16. Duration is in seconds. Index refers to the order of speaker turns.

Building this model with Cog

SSH into a Linux environment with a GPU
Install Cog (using replicate/codespaces if you're using GitHub Codespaces)
Create a HuggingFace token and add it to predict.py as HUGGINGFACE_TOKEN (TOOD: Move it out of predict.py somehow.. maybe into a script that caches the weights)
Accept license aggrements for these two models on HuggingFace:

Run cog predict -i [email protected]

Then:

Create a new model at replicate.com/create
Run cog push r8.im/your-username/your-model-name

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
example.mp3		example.mp3
predict.py		predict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pyannote.audio + Replicate = 💛

How this model works

Building this model with Cog

About

Releases

Packages

Contributors 2

Languages

License

meetjamie/whos-who

Folders and files

Latest commit

History

Repository files navigation

Pyannote.audio + Replicate = 💛

How this model works

Building this model with Cog

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages