A repository for transcribing audio files using Whisper. It provides guidelines for creating chapters at logical topic transitions, with concise headings and respect for timestamps. It performs transcription and export the results as chapters.csv
and transcript.csv
files.
See more info at the original Repository.
- Install
symbolicai
pip install symbolicai[whisper]
- Use the builtin
sympkg
to install the package
sympkg i ExtensityAI/symscribe
- Create an alias for the
symscribe
command
symrun c symscribe ExtensityAI/symscribe
Supported features:
- Whisper models: all models supported by
symbolicai
(default:"base"
)- To change the model, before you run the
symscribe
doexport SPEECH_ENGINE_MODEL="..."
, where...
is the model name (available).
- To change the model, before you run the
- Language:
"language=..."
(default:"language=en"
) - Export directory:
"export_dir=..."
(default:"export_dir=."
) - Transcription format:
"transcript_only=..."
(default:"transcript_only=True"
)- If
transcript_only=True
, the script will only export the transcript without creating chapters.
- If
- Bin size:
"bin_size_s=..."
(default:"bin_size_s=300"
)- The bin size is the duration of each audio file in seconds when splitting the audio file into smaller chunks.
Example:
symrun symscribe "path_to_file.mp3/mp4/..." "language=en" "export_dir=/tmp" "bin_size_s=300" "transcript_only=True"