Read and convert subtitle (.srt
) file to csv
or List
libraryDependencies += "io.github.mdauthentic" % "sous-title_2.13" % "0.3.0"
import io.github.mdauthentic.core._
Calling the open
or readInLine
method returns an SRT
type containing id
, startTime
, endTime
and sub
(the subtitle itself).
scala> val reader = SRTReader.open("file.srt")
reader: List(SRT(1, 00:00:33.599, 00:00:35.270, List(Soy Amelia Folch.)))
Inline reader returns a list of .srt
type
scala> val srt =
"""1
|00:00:33,599 --> 00:00:35,270
|(NARRA) Soy Amelia Folch.
|
|2
|00:00:36,199 --> 00:00:39,870
|Tengo 23 años y sin embargo
|he salvado la vida del Empecinado.""".stripMargin
scala> val inlineReader = SRTReader.readInLine(srt)
inlineReader: List(SRT(1,00:00:33.599,00:00:35.270,List((NARRA) Soy Amelia Folch.)), SRT(2,00:00:36.199,00:00:39.870,List(Tengo 23 años y sin embargo, he salvado la vida del Empecinado.)))
If you are interested in only some part of the result returned by the reader
, for instance the subtitle
and not the rest i.e. id
, start
and end time
, then you can extract just the subtitle by doing something like this;
scala> inlineReader.sub
List(List((NARRA) Soy Amelia Folch.), List(Tengo 23 años y sin embargo, he salvado la vida del Empecinado.))
There are two ways to write to file;
- writing without header
scala> val reader = SRTReader.open("file.srt")
reader: List[SRT] = List(SRT(1, 00:00:33.599, 00:00:35.270, List(Soy Amelia Folch.)))
scala> SRTWriter.write(reader, "output.csv")
using file path directly
scala> SRTWriter.write("inputFileName.srt", "outputFileName.csv")
- with user-defined header
scala> val header = List("id", "start_time", "end_time", "subtitle")
header: List[String] = List(id, start_time, end_time, subtitle)
scala> SRTWriter.write("input.srt", "output.csv", header)
In Scandal
(a TV series), wine
was mentioned several times and I was curious to know the number of times the word was used in the entire series (from seasons 1 - 7). This library was used to convert all the subtitle files for this series to csv
format for further analysis.
This library will come in handy in data analysis projects for parsing and extracting the contents of subtitle files.