Skip to content

πŸ”‰ This is an implementation of pyannote.audio and cog to host on replicate.

License

Notifications You must be signed in to change notification settings

meetjamie/whos-who

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Pyannote.audio + Replicate = πŸ’›

This is an implementation of pyannote.audio in a cog wrapper to easily run speaker diarization via replicate to save you the trouble of dependency hell πŸ˜‡.

How this model works

The model takes an input audio file in the audio parameter. Then, it runs speaker diarization and returns a list of audio files containing the individual speaker turns within the audio file split by speaker and index. The output URLs contain encoded information in the file name to make working with the outputs easier. The format for the file name is {index}_{speaker}_{duration} which resolves to 0_SPEAKER_01_16. Duration is in seconds. Index refers to the order of speaker turns.

Building this model with Cog

  1. SSH into a Linux environment with a GPU
  2. Install Cog (using replicate/codespaces if you're using GitHub Codespaces)
  3. Create a HuggingFace token and add it to predict.py as HUGGINGFACE_TOKEN (TOOD: Move it out of predict.py somehow.. maybe into a script that caches the weights)
  4. Accept license aggrements for these two models on HuggingFace:
  1. Run cog predict -i [email protected]

Then:

  1. Create a new model at replicate.com/create
  2. Run cog push r8.im/your-username/your-model-name

About

πŸ”‰ This is an implementation of pyannote.audio and cog to host on replicate.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages