Skip to content

Whisper-based transcription tool with chapter segmentation and timestamp handling.

License

Notifications You must be signed in to change notification settings

ExtensityAI/symscribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

symscribe

A repository for transcribing audio files using Whisper. It provides guidelines for creating chapters at logical topic transitions, with concise headings and respect for timestamps. It performs transcription and export the results as chapters.csv and transcript.csv files.

Installation

See more info at the original Repository.

  1. Install symbolicai
pip install symbolicai[whisper]
  1. Use the builtin sympkg to install the package
sympkg i ExtensityAI/symscribe
  1. Create an alias for the symscribe command
symrun c symscribe ExtensityAI/symscribe

Usage

Supported features:

  • Whisper models: all models supported by symbolicai (default: "base")
    • To change the model, before you run the symscribe do export SPEECH_ENGINE_MODEL="...", where ... is the model name (available).
  • Language: "language=..." (default: "language=en")
  • Export directory: "export_dir=..." (default: "export_dir=.")
  • Transcription format: "transcript_only=..." (default: "transcript_only=True")
    • If transcript_only=True, the script will only export the transcript without creating chapters.
  • Bin size: "bin_size_s=..." (default: "bin_size_s=300")
    • The bin size is the duration of each audio file in seconds when splitting the audio file into smaller chunks.

Example:

symrun symscribe "path_to_file.mp3/mp4/..." "language=en" "export_dir=/tmp" "bin_size_s=300" "transcript_only=True"

About

Whisper-based transcription tool with chapter segmentation and timestamp handling.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages