Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize Transcription Output #161

Open
clpStephen opened this issue Feb 22, 2024 · 13 comments
Open

Customize Transcription Output #161

clpStephen opened this issue Feb 22, 2024 · 13 comments
Labels
enhancement New feature or request

Comments

@clpStephen
Copy link

Is your feature request related to a problem? Please describe.
We submit transcriptions of interviews and clips to studio producers. They have a very specific format that they want. I can potentially achieve this by taking the srt file and having another ai reformat it for me, but that's a lot of tedious extra steps. Is it possible to reconfigure the transcription provided by StoryToolkit?

Describe the solution you'd like
I would like a config file or settings within the app. Here's an example of what I mean:
In app, the transcription looks like this:

Speaker 2
It's fine.
Speaker 3
Pick a roll, baby girl.

I need to be able to output that to something like this:

[011824_JOE INTV]
13:51:51
WOMAN 2: It's fine.

[011824_JOE INTV]
13:51:55
INTERVIEWER: Pick a roll, baby girl.

Every segment has the name of the file, the timecode, and the transcript.

If I output to text for Avid, it doesn't include timecodes or anything, just the transcript with line breaks, not even speakers. Also, I would like to be able to rename identified speakers for use in this output.

Describe alternatives you've considered
The workaround would be to feed the file to an ai text tool and have it reformat to what I need.

Additional context
Sorry if this is addressed elsewhere already. Also maybe the assistant could do this. I can't get it to work, it just says it is having trouble connecting. I do have an API key.

Thanks!

@clpStephen clpStephen added the enhancement New feature or request label Feb 22, 2024
@octimot
Copy link
Owner

octimot commented Feb 22, 2024

Hey there!

First, since this is pretty much standard reformatting, I think a custom transcription output is the way to go. I'm going to try to see how to best approach this since - as you also pointed out - a lot of studios / post houses actually have unique preferences when it comes to this, and coding it universally for everybody wouldn't make too much sense. But allowing anyone to make up their own custom export templates might make more sense...

To address some of the issues that you also mentioned:

If I output to text for Avid, it doesn't include timecodes or anything, just the transcript with line breaks, not even speakers.

If you're referring to AVID DS exports, I think this is due to the standard format. Apart from adding the speaker in front of every line, there's not much we can change I think.

Also, I would like to be able to rename identified speakers for use in this output.

You can OPTION/ALT + click on the transcript or right-click -> Edit and rename the detected speakers. You can also use CMD/CTRL+F to find and replace speaker names in bulk. Or, is there something not working correctly on your end?

Also, I would like to be able to rename identified speakers for use in this output.

The Assistant should be able to do this, especially when using GPT-4, but since this is solvable by a simple formatting algorithm, I think using AI for the task is an overkill (and costly, depending on the amount of transcripts you deal with)...

Also maybe the assistant could do this. I can't get it to work, it just says it is having trouble connecting. I do have an API key.

Just to make sure, if you have an OpenAI key, it needs to be entered in Preferences -> Assistant -> OpenAI API Key
More details here.

Cheers!

@clpStephen
Copy link
Author

Thanks Octimot! I'll look for a solution to get it quickly reformatted in the meantime. GPT deifnitely can do it but the character limit impedes me. I'll re-enter my api key to see if that corrects the issue. I'm beginning with the transcription features first as that can cure a lot of pain. We are primarily an Avid house although we do have some Resolve and one show that uses Premeire. I look forward to seeing how else we can benefit from your tool.

@octimot
Copy link
Owner

octimot commented Feb 23, 2024

I'll push an update on Github that allows the creation of custom transcription exports sometime next week or maybe sooner...

For the particular use case that you mentioned, the template you'd need to create would probably look like this:

name: Custom Export Template

extension: txt

segment_template: |
  [{transcription_name}]
  {segment_start_tc}
  {segment_speaker_name}: {segment_text}

segment_separator: "\n\n"

Once this is saved in a .yaml file in templates/transcription_export you'll be able to export exactly in the format you need.

Question: are you using the git version of the tool, or the standalone?

@clpStephen
Copy link
Author

Thanks so much for that! I am using the standalone on a windows 10 workstation. The version that plugs into Resovle may be beneficial down the line. I'll answer that question once I start working with my AEs on this to see what they think. I want to get it functional for us first though. I'll try your template!

@octimot
Copy link
Owner

octimot commented Feb 23, 2024

Both the git and the standalone versions should connect to Resolve Studio.

It would be great if you could attempt a git installation because you'll be able to access the update I was mentioning faster (as soon as I push it to Github)!

I'll come back on this issue when it's up and ready.

Cheers

@clpStephen
Copy link
Author

I'm working on the git, I am having some issues. I guess I should be on python 3.10 rather than 3.12?

@octimot
Copy link
Owner

octimot commented Feb 27, 2024

Yes, some of the packages that the tool is using are not tested or not compatible with anything newer than 3.10.

@clpStephen
Copy link
Author

Ok, I have StoryToolkitAI GIT launchable.

@octimot
Copy link
Owner

octimot commented Feb 29, 2024

I just pushed version 0.24.1 which includes custom transcription export templates (commit bb2011a)

Just update the tool and try to add the custom template that I recommended above in the templates/transcription_export folder in your StoryToolkitAI configuration folder. To find the configuration folder on your machine, open StoryToolkitAI and go to File -> Open Configuration Folder

Full instructions for how to work with custom export templates here.

Please let me know if the templates work on your end.

Cheers!

@clpStephen
Copy link
Author

Will do, Thanks!

@clpStephen
Copy link
Author

Thanks so much for your attention, Octimot! I have it working pretty well now. It is still struggling with speaker detection so I'm playing with models and settings to try to fine tune that. An issue I've come across with this process is that if I generate a transcription and Detect speakers via the Ingest, I get a valid transcription from my custom template yaml. But if I try to change some settings and run Detect Speakers on the json file that was already built, I lose the content when exporting a new transcription. It just gives me the header. I'll attach those here but please let me know ifthis should be an entirely new issue or if this is a good thread for this. I do have the workaround of it working the first time it's generated although it does occasionally fail the Speaker ID for some reason.
012124.txt
CLP_trans.txt
note- it wouldn't allow me to attach yaml so i changed extension to txt. yaml is the CLP_trans file.

@octimot
Copy link
Owner

octimot commented Mar 1, 2024

As far as I can tell, you should remove the conditional with the speaker, see below:

segment_condition: |
  not {segment_meta}
  not {segment_meta_speaker}
  not {segment_meta_other}
  '{segment_speaker_name}' == 'Speaker 1'.   <------- REMOVE THIS ENTIRE LINE

What that does is it tells the export function to only export segments that have "Speaker 1" as the speaker name.

Cheers!

@BristolBEAT
Copy link

I've created a WAV in Avid with timecode. Imported into Avid and Resolve to check it exists which is does.

However the timecode doesn't seem to carry across to StoryToolKitAI? H264s are fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants