Customize Transcription Output #161

clpStephen · 2024-02-22T15:47:25Z

Is your feature request related to a problem? Please describe.
We submit transcriptions of interviews and clips to studio producers. They have a very specific format that they want. I can potentially achieve this by taking the srt file and having another ai reformat it for me, but that's a lot of tedious extra steps. Is it possible to reconfigure the transcription provided by StoryToolkit?

Describe the solution you'd like
I would like a config file or settings within the app. Here's an example of what I mean:
In app, the transcription looks like this:

Speaker 2
It's fine.
Speaker 3
Pick a roll, baby girl.

I need to be able to output that to something like this:

[011824_JOE INTV]
13:51:51
WOMAN 2: It's fine.

[011824_JOE INTV]
13:51:55
INTERVIEWER: Pick a roll, baby girl.

Every segment has the name of the file, the timecode, and the transcript.

If I output to text for Avid, it doesn't include timecodes or anything, just the transcript with line breaks, not even speakers. Also, I would like to be able to rename identified speakers for use in this output.

Describe alternatives you've considered
The workaround would be to feed the file to an ai text tool and have it reformat to what I need.

Additional context
Sorry if this is addressed elsewhere already. Also maybe the assistant could do this. I can't get it to work, it just says it is having trouble connecting. I do have an API key.

Thanks!

octimot · 2024-02-22T16:57:51Z

Hey there!

First, since this is pretty much standard reformatting, I think a custom transcription output is the way to go. I'm going to try to see how to best approach this since - as you also pointed out - a lot of studios / post houses actually have unique preferences when it comes to this, and coding it universally for everybody wouldn't make too much sense. But allowing anyone to make up their own custom export templates might make more sense...

To address some of the issues that you also mentioned:

If I output to text for Avid, it doesn't include timecodes or anything, just the transcript with line breaks, not even speakers.

If you're referring to AVID DS exports, I think this is due to the standard format. Apart from adding the speaker in front of every line, there's not much we can change I think.

Also, I would like to be able to rename identified speakers for use in this output.

You can OPTION/ALT + click on the transcript or right-click -> Edit and rename the detected speakers. You can also use CMD/CTRL+F to find and replace speaker names in bulk. Or, is there something not working correctly on your end?

Also, I would like to be able to rename identified speakers for use in this output.

The Assistant should be able to do this, especially when using GPT-4, but since this is solvable by a simple formatting algorithm, I think using AI for the task is an overkill (and costly, depending on the amount of transcripts you deal with)...

Also maybe the assistant could do this. I can't get it to work, it just says it is having trouble connecting. I do have an API key.

Just to make sure, if you have an OpenAI key, it needs to be entered in Preferences -> Assistant -> OpenAI API Key
More details here.

Cheers!

clpStephen · 2024-02-23T14:35:33Z

Thanks Octimot! I'll look for a solution to get it quickly reformatted in the meantime. GPT deifnitely can do it but the character limit impedes me. I'll re-enter my api key to see if that corrects the issue. I'm beginning with the transcription features first as that can cure a lot of pain. We are primarily an Avid house although we do have some Resolve and one show that uses Premeire. I look forward to seeing how else we can benefit from your tool.

octimot · 2024-02-23T15:08:37Z

I'll push an update on Github that allows the creation of custom transcription exports sometime next week or maybe sooner...

For the particular use case that you mentioned, the template you'd need to create would probably look like this:

name: Custom Export Template

extension: txt

segment_template: |
  [{transcription_name}]
  {segment_start_tc}
  {segment_speaker_name}: {segment_text}

segment_separator: "\n\n"

Once this is saved in a .yaml file in templates/transcription_export you'll be able to export exactly in the format you need.

Question: are you using the git version of the tool, or the standalone?

clpStephen · 2024-02-23T15:31:41Z

Thanks so much for that! I am using the standalone on a windows 10 workstation. The version that plugs into Resovle may be beneficial down the line. I'll answer that question once I start working with my AEs on this to see what they think. I want to get it functional for us first though. I'll try your template!

octimot · 2024-02-23T15:37:44Z

Both the git and the standalone versions should connect to Resolve Studio.

It would be great if you could attempt a git installation because you'll be able to access the update I was mentioning faster (as soon as I push it to Github)!

I'll come back on this issue when it's up and ready.

Cheers

clpStephen · 2024-02-27T15:31:58Z

I'm working on the git, I am having some issues. I guess I should be on python 3.10 rather than 3.12?

octimot · 2024-02-27T15:54:58Z

Yes, some of the packages that the tool is using are not tested or not compatible with anything newer than 3.10.

clpStephen · 2024-02-27T17:35:28Z

Ok, I have StoryToolkitAI GIT launchable.

octimot · 2024-02-29T12:50:53Z

I just pushed version 0.24.1 which includes custom transcription export templates (commit bb2011a)

Just update the tool and try to add the custom template that I recommended above in the templates/transcription_export folder in your StoryToolkitAI configuration folder. To find the configuration folder on your machine, open StoryToolkitAI and go to File -> Open Configuration Folder

Full instructions for how to work with custom export templates here.

Please let me know if the templates work on your end.

Cheers!

clpStephen · 2024-02-29T15:37:33Z

Will do, Thanks!

clpStephen · 2024-03-01T17:10:13Z

Thanks so much for your attention, Octimot! I have it working pretty well now. It is still struggling with speaker detection so I'm playing with models and settings to try to fine tune that. An issue I've come across with this process is that if I generate a transcription and Detect speakers via the Ingest, I get a valid transcription from my custom template yaml. But if I try to change some settings and run Detect Speakers on the json file that was already built, I lose the content when exporting a new transcription. It just gives me the header. I'll attach those here but please let me know ifthis should be an entirely new issue or if this is a good thread for this. I do have the workaround of it working the first time it's generated although it does occasionally fail the Speaker ID for some reason.
012124.txt
CLP_trans.txt
note- it wouldn't allow me to attach yaml so i changed extension to txt. yaml is the CLP_trans file.

octimot · 2024-03-01T17:51:16Z

As far as I can tell, you should remove the conditional with the speaker, see below:

segment_condition: |
  not {segment_meta}
  not {segment_meta_speaker}
  not {segment_meta_other}
  '{segment_speaker_name}' == 'Speaker 1'.   <------- REMOVE THIS ENTIRE LINE

What that does is it tells the export function to only export segments that have "Speaker 1" as the speaker name.

Cheers!

BristolBEAT · 2024-04-18T13:43:19Z

I've created a WAV in Avid with timecode. Imported into Avid and Resolve to check it exists which is does.

However the timecode doesn't seem to carry across to StoryToolKitAI? H264s are fine.

clpStephen added the enhancement New feature or request label Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customize Transcription Output #161

Customize Transcription Output #161

clpStephen commented Feb 22, 2024

octimot commented Feb 22, 2024

clpStephen commented Feb 23, 2024

octimot commented Feb 23, 2024 •

edited

clpStephen commented Feb 23, 2024

octimot commented Feb 23, 2024

clpStephen commented Feb 27, 2024

octimot commented Feb 27, 2024

clpStephen commented Feb 27, 2024

octimot commented Feb 29, 2024

clpStephen commented Feb 29, 2024

clpStephen commented Mar 1, 2024

octimot commented Mar 1, 2024

BristolBEAT commented Apr 18, 2024

Customize Transcription Output #161

Customize Transcription Output #161

Comments

clpStephen commented Feb 22, 2024

octimot commented Feb 22, 2024

clpStephen commented Feb 23, 2024

octimot commented Feb 23, 2024 • edited

clpStephen commented Feb 23, 2024

octimot commented Feb 23, 2024

clpStephen commented Feb 27, 2024

octimot commented Feb 27, 2024

clpStephen commented Feb 27, 2024

octimot commented Feb 29, 2024

clpStephen commented Feb 29, 2024

clpStephen commented Mar 1, 2024

octimot commented Mar 1, 2024

BristolBEAT commented Apr 18, 2024

octimot commented Feb 23, 2024 •

edited