GitHub - len-sla/ffmpeg-whisper: preparing set of tools for video/pictures preprocessing based on ffmpeg, bash, gradio

FFmpeg and docker plus Whisper power of transcripting

"Whisper is an open-source Python package developed by OpenAI that provides speech recognition capabilities. More information about Whisper can be found on their GitHub page (https://github.com/openai/whisper). I was interested in using Whisper to transcribe my own MP3 files in a private environment.

To avoid polluting my own OS, I decided to build a Docker image and perform all my trials within the container. I shared a directory with the Docker container and prepared Python or Bash scripts locally to execute on specific directories or files.

Summary of potential for most common lanugaes are impressive based on the picture from their github

To create the Docker image, I started with a lightweight Python 3.9 slim-buster image with FFMPEG content and JupyterLab, which was around 800MB in size. I then followed the instructions for installing Whisper on a standard Linux environment and translated them into the content for my Dockerfile.

While creating the Docker image, I encountered some issues, such as missing libraries that required updates one by one. Initially, I did not address security issues, such as allowing root in the Docker container, which could threaten the host OS.

The final Dockerfile is provided below, and the size is about 8GB, mostly due to the PyTorch framework."

FROM python:3.9-slim-buster

# Install dependencies git was required
RUN apt-get update && \
    apt-get install -y ffmpeg git

# Install JupyterLab, FFMPEG-Python, and PyTorch
RUN pip install --upgrade  jupyterlab ffmpeg-python torch==1.10.1 torchvision torchaudio -f https://download.pytorch.org/whl/cu111/torch_stable.html tqdm tiktoken numba
 
# https://github.com/openai/whisper
# Install whisper
RUN pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

# Set the working directory
WORKDIR /app
ENV JUPYTER_PORT 8888


# Launch JupyterLab
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--allow-root"]

Build the Docker image:

Once you have created the Dockerfile, you can build the Docker image using the following command:

docker build -t my_image_name .

and running

docker run --rm  -p 8888:8888 -v $(pwd):$(pwd) -w $(pwd) --cpus 2 --name whispi my_image_name

whispi is temporary name for container

I limited usage to 2 CPU to avoid crashes of host system

It was tested on base model and english 2min stress.mp3 from Hubermanlab polish stan-pl.mp3 from onet.pl to check other lenguage performance. Result as expected are stunning.

Things to improve removing security issues from image root etc in Dockerfie optimising RUN to avoid creating extra layers

you could use gradio interface to have simple and elegant GUI solution for front end for all kinds of application including this one: including processing mp3 files in batch mp3 file was chopped to 30s pieces with ffmpeg so the audio.shape os equal (480000,)

example is included whisper-in-colab-batch-microphe-mp3-conversion-with-gradio.ipynb

If you need to quickly get overview of what particular conversation, mp4, mp4 is all about there is ready to use nootbook as example 2.5h mp3 file was taken https://hubermanlab.com/how-to-breathe-correctly-for-optimal-health-mood-learning-and-performance/ from well known Dr. Huberman ( Stanford) then transcribed and summarised with google Pegasus summariser

https://github.com/len-sla/ffmpeg-whisper/blob/main/All_in_one_mp3_chop_transcribe_summarise.ipynb

Of course you could use other summariser but they are not so spectacular in results in the so called 0 shot attempt My comparison and notebook is below https://github.com/len-sla/NLP-BERT/blob/master/Ssummarise--T5_GPT-Graph-Longformer_Pegasus.ipynb

Colab

Colab is convinient when you dont care about privacy though for some operations with private file like changing converting your private videos I recomend using ffmpeg on local machine. Someone could say that installing whole environment could take a while and will not be so simple. Then what for once you have docker some ready to use image and then temporary container could be utilised.

processing mp4 files using bash script and ffmpeg installed in Docker environment to avoid polluting OS

I need to mention here excellent work of the Julien Rottenberg's team
https://github.com/jrottenberg/ffmpeg

You can install the latest build of this image by running:

docker pull jrottenberg/ffmpeg:${VERSION}-${VARIANT} or docker pull ghcr.io/jrottenberg/ffmpeg:${VERSION}-${VARIANT}.

Example which is converting high resolution video to from handy to some rescaled mp4 640x ... is below( working like charm)

docker run --rm -d -v $(pwd):$(pwd) -w $(pwd) --name mp4-converter jrottenberg/ffmpeg:4.4-ubuntu -i /mnt/c/docker_out/ffmpeg/dzia/po3.mp4 -vf scale=640:-1 /mnt/c/docker_out/ffmpeg/dzia/_po3.mp4

excellent guides are there but giving short info about my example after -i( interactive) flag is path with input file /mnt/c/docker_out/ffmpeg/dzia/po3.mp4 and otput file _/mnt/c/docker_out/ffmpeg/dzia/po3.mp4

If there is need to convert whole directory this time (to change a bit subject convertig mp3 to wav format) then preparing some bash script where docker converter is inside do the job. Content of the script could be as follows:

for i in *.mp4;

	do	docker run --rm -d -v $(pwd):$(pwd) -w $(pwd) --cpus 3 --name "$i" jrottenberg/ffmpeg:4.4-ubuntu -i "$i" -b:v 1M  "$(basename "$i" .mp4)"_re.mp4  ;
	name=`echo "$i" | cut -d'.' -f1`;
	echo "$name"	;

	sleep 1
done

You need to be careful with allocating resources in this case CPU( --cpus 3) I limited that to 3 ( it is not particularly memory intensive process convertig 200MB file to 1Mb rate uses ~300MB memory. Once you allocate resources batch will take care about whole process utilising machine resources withing given limit.

Technologies

Python,
ffmpeg,
docker,
bash,

Setup

easiest is to install/update libraries accordnig to install secion in notebook

Status

Project is: in progress

Other information

Notebook is divided on universal fuctions whicht could be easlily used elsewhere.

Contact

Created by: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
All_in_one_mp3_chop_transcribe_summarise.ipynb		All_in_one_mp3_chop_transcribe_summarise.ipynb
Dockerfile		Dockerfile
README.md		README.md
Using_ffmpeg_pre_post_process.ipynb		Using_ffmpeg_pre_post_process.ipynb
change-bitrate2-cpu.sh		change-bitrate2-cpu.sh
docker-with-limits.PNG		docker-with-limits.PNG
ffmpeg_docker_convmp3_wav.sh		ffmpeg_docker_convmp3_wav.sh
gradio.JPG		gradio.JPG
gradio1.JPG		gradio1.JPG
mp4_change-bitrate-cpu.sh		mp4_change-bitrate-cpu.sh
wh.gif		wh.gif
whisper-in-colab-batch-microphe-mp3-conversion-with-gradio.ipynb		whisper-in-colab-batch-microphe-mp3-conversion-with-gradio.ipynb
whisper.png		whisper.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FFmpeg and docker plus Whisper power of transcripting

Build the Docker image:

Colab

processing mp4 files using bash script and ffmpeg installed in Docker environment to avoid polluting OS

Technologies

Setup

Status

Other information

Contact

About

Releases

Packages

Languages

len-sla/ffmpeg-whisper

Folders and files

Latest commit

History

Repository files navigation

FFmpeg and docker plus Whisper power of transcripting

Build the Docker image:

Colab

processing mp4 files using bash script and ffmpeg installed in Docker environment to avoid polluting OS

Technologies

Setup

Status

Other information

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages