Little late to the party, but finally there. Yesterday I ran my first AI model locally.
Its an audio segmentation model that takes a wav file input and segments the speakers, into who spoke when with timestamp.
Here’s the link: https://huggingface.co/pyannote/speaker-diarization-3.1
Here’s how you can too.
Firstly you need to go sign up into huggingface.co – Its like github for AI models.
You can simply clone it in a directory. Most of the models are ran through python, I think you can connect any wrapper but python is easy so lets do that.
Go into the project root directory, and install conda. Conda is an open source package management system for python. Think of it as NPM.
After conda is installed, you can you the following command to activate the environment.
conda create -n diarization python=3.10
conda activate diarization
After that you need the following libraries to run the model.
conda install pytorch torchvision torchaudio -c pytorch -c conda-forge
Once these are downloaded then you’ll have to install pyannote.audio, make sure you install version > 3. Older versions won’t work for our model.
pip install pyannote.audio==3.1.1
Then we need to write little python code of our own to run the model with our file. Before we do that we need to generate a token. You can do it from here: hf.co/settings/tokens
Here the link to the code. It contains a wav file as well that you can use to test. Put start.py in you root folder and then try the following command.
python start.py