Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: Time-based cutting utility #36

Open
johnpyp opened this issue Oct 28, 2020 · 1 comment
Open

Suggestion: Time-based cutting utility #36

johnpyp opened this issue Oct 28, 2020 · 1 comment

Comments

@johnpyp
Copy link

johnpyp commented Oct 28, 2020

For stuff like cleaning audio transcript datasets, it's necessary to cut out segments of the corresponding subtitles when cutting out bad parts of the training audio. This is partially doable by merging the subtitles into an mkv container with the audio, and then using ffmpeg on it and splitting them apart again, but is far from ideal.

Having an easy way to just operate on the subtitles with an api like subs.cut(start="30:30", end="40:20"), which would remove the offending section and then shift everything after down would be really nice for this usecase.

@tkarabela
Copy link
Owner

tkarabela commented Oct 29, 2020

That sounds like a useful feature! :) In terms of API, pysubs2 represents time in seconds. When a method (like SSAFile.shift()) takes just one time, it can be "sugared" to keyword arguments for hours, seconds, etc., so it looks pretty short: subs.shift(m=1, s=30). Unfortunately this would not work for multiple time inputs, which could work like this:

from pysubs2 import load, make_time

subs = load("subtitles.srt")
subs.cut(start=make_time(m=30, s=30), end=make_time(m=40, s=20))
subs.save("subtitles-cut.srt")

...which is a bit ugly/verbose, though pretty unambiguous and robust for scripted use.

I imagine you may have multiple segments to cut out, in which case it would be nice to be able to specify them all at once, so that all times have the same reference (otherwise you may have to compensate for time shift from previous cuts):

subs.cut([[make_time(m=1, s=30), make_time(m=2, s=0)],
          [make_time(m=15, s=45), make_time(m=16, s=10)]])

Finally, for quick-and-dirty use, this would be a nice addition to the commandline interface, eg.:

$ pysubs2 --cut 30m30s 40m20s subtitles.srt >subtitles-cut.srt

Or perhaps even the more usual (though slightly more ambiguous):

$ pysubs2 --cut 0:30:30 0:40:20 subtitles.srt >subtitles-cut.srt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants