Toolchain that helps convert a large library of videos and produce searchable transcripts, using Watson Text-to-Speech (TTS). Output can be transformed, for example to the raw ingestion format expected by Watson Multimedia Search (MMS).
Soundjot is currently designed for fault tolerance, to work with varying network conditions, media formats, etc. Toward that end, it writes files to local disk at each step. By default, these files will live in your user directory, with the following structure.
$HOME/tmp-soundjot/
./in
./out
Soundjot relies on FFmpeg to transcode media.
FFmpeg download options are described
here. The fewest issues have been
observed by installing via git clone
and hand-tailoring the elaborate
configuration options. A bash file is included to illustrate options that
have worked best.
./ffmpeg_install.bash
npm install
You may provide a path to a line-delimited file or a URL to the -f option. It must contain paths or URLs to video resources with dialog.
node soundjot.js -f /path/to/streams.txt --CLOBBER=false
Usage: soundjot node soundjot.js [options] -f <file>
Options:
-h, --help output usage information
-V, --version output the version number
-d, --dir <path> Working directory
-f, --inputfile <filename> Specify input
-o, --outputdir <path> Specify destination directory
-u, --username [string] Watson service username
-p, --password [string] Watson service password
-s, --segment_time [hh:mm:ss] Output chunk duration
-x, --max_size [number] Stop transcoding at max_size bytes
-c, --contenttype [string] Content-Type of input
-t, --transform [path] Transforms results to desired format
-m, --metabundler [path] Metadata bundler
--uploader [path] Uploader
Soundjot is intended to be semi-automated and will likely become more automatic with continued development.
- Data entry
- stream url(s)
- filepath(s)
- Stream Info (HEAD?)
- Configuration - based on data entered and what is known about stream
- TTS Auth - get token from Watson services
- Get stream
- Downsample stream
- Send stream to TTS
- Listen for result events and termination
- Write JSON output (may be appends or one write operation)
- Bundle metadata
- Transform TTS transcript + metadata => desired output
- Upload completed transcript
To learn more about Watson Text-to-Speech, please visit its API documentation. SDKs in popular languages are also provided in the Watson Developer Cloud. Soundjot is built using the Developer Cloud Node SDK.
The modules below are listed in priority order, from critical to nice-to have.
Accept inputs and submits audio bytes to Watson Speech-to-Text, and records the resulting text or JSON response.
- Options
- Keywords
Accepts urls or file paths, and retrieves video as a binary stream. Downsamples the video stream to to 16-bit FLAC, and outputs the results.
For a given video, collects a handful of metadata fields that are usually absent from transcript data.
Accept one or more TTS outputs, metadata output, and converts to a desired output format.
Submits finished transcripts to desired endpoint, e.g. Cloudant DB.
Runs all the modules in stepwise fashion.
Exposes endpoints for transcription services. Ideally resides on a machine that handles all the stream data. Ideally pipes stream data instead of saving on disk. Acts as a Controller module, and composer of other modules listed. It may take charge of looping over collections, and threading.
Present a a page of controls that allows a user to submit videos or collections of videos.
Accepts a domain and path, and collects probably video source URLs from it.