Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smarter subtitles cutting optimization for automatically and not automatically generated subtitles #23

Open
stanislaw opened this issue Feb 13, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@stanislaw
Copy link
Contributor

stanislaw commented Feb 13, 2023

This issue is a follow-up to the other subtitles-related issue: #22.

I have found that with both auto- and non-auto subtitles it is very often the case that the subtitles are cut in the middle of a sentence and the resulting mp3 sounds very incomplete. It looks like two possible optimizations can be applied in order to cut the cards sentence-based. This can be ensured by merging the subtitles that follow each other closely.

The original description can be found here: #7.

When the subtitles are manually created and well-written, they can be merged together to form full sentences so that the audio files are cut sentence-by-sentence not part-by-part. This can result in a very precise cutting and therefore in a much better user experience out of the box. For automatically generated subtitles, at least a distance check can be made: If the subtitles are very close in time to each other, they can be also merged because it's likely that they are part of the same sentence spoken in the video. Both algorithms could be enabled with dedicated option checkboxes in the UI.

Not automatically generated subtitles

This is a very good example of very well crafted subtitles: https://www.youtube.com/watch?v=GfF2e0vyGM4. Each of the subtitle forms a part of a sentence and at some point the sentence finishes with a good punctuation sign like . or ?. In this case, it can be optimal to concatenate the subtitles until one sentence is full finished. This should make perfect Anki cards, one card per sentence.

Automatically generated subtitles

In this case, it is not possible to rely on punctuation. Instead, one subtitle range can be checked with a previous one:

If the distance between subtitle ranges is less than 0.1s, they can be concatenated as forming the same sentence. Otherwise, they are two different cards and should be kept separate.

Using this technique, I could achieve a decent cutting of the sentences as spoken in this video: https://www.youtube.com/watch?v=7I4J4vy2Deo&t=12s.


Both techniques could be activated with a corresponding option in the UI.

@kamui-fin kamui-fin added the enhancement New feature or request label Feb 14, 2023
@stanislaw
Copy link
Contributor Author

I would be happy to work on the one for Not automatically generated subtitles.

@kamui-fin
Copy link
Owner

Great, I'll take care of the rest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants