Parse fails when files have multiple timestamps for single line of text #40

1hitsong · 2023-03-07T21:31:10Z

When attempting to parse a lyric file that has the following line, the parser failed.

[00:51.00][01:29.99][01:48.29][02:31.00][02:41.99]You gotta fight !

The expected outcome was for the parser to use the text "You gotta fight !" at each of the five time stamps.

The text was updated successfully, but these errors were encountered:

andy840119 · 2023-03-08T02:00:50Z

Thanks for report the issue!
This issue will be fixed in the #41 and make a new release soon 🎉 .

1hitsong · 2023-03-08T02:16:35Z

Awesome! Thanks for hopping on this so quickly

Looking at the PR for the fix, would this result in the output being option 1 or option 2?

Option 1
[00:51.00]You gotta fight !

or Option 2
[00:51.00]You gotta fight !
[01:29.99]You gotta fight !
[01:48.29]You gotta fight !
[02:31.00]You gotta fight !
[02:41.99]You gotta fight !

andy840119 · 2023-03-08T02:19:26Z

The result will be the option 1.

andy840119 · 2023-03-08T02:23:19Z

And note that new NuGet package has been released!

see:
https://www.nuget.org/packages/LrcParser

1hitsong · 2023-03-08T02:25:31Z

The result will be the option 1.

Cool. Just making sure I'm following.

I've seen a lot of sites that use the multiple timestamps in a way that means option 2, but If that's not to the LRC standard, then option 1 makes the most sense.

andy840119 · 2023-10-17T15:54:56Z

Guess this issue is fixed 🚀

Shadowghost · 2024-05-03T13:02:46Z

@andy840119 in my opinion this is not the correct fix for the issue.
Implementing Option 2 mentioned above would make the lib adhere to the basic LRC format.
Multiple timestamps on the same line should result in that line appearing on those timestamp. Dropping all except the first timestamp removes quite any chorus lyrics from songs after the first occurrence.

andy840119 · 2024-05-03T14:36:34Z

https://en.wikipedia.org/wiki/LRC_(file_format)
Seems there's two usage for .lrc format

One is for sceeen text:

[00:21.10][00:45.10]Repeating lyrics (e.g. chorus)

Another one is for karaoke format:

[mm:ss.xx] <mm:ss.xx> line 1 word 1 <mm:ss.xx> line 1 word 2 <mm:ss.xx> ... line 1 last word <mm:ss.xx>

But (as i remember) some software might export the .lrc with those format as well:

[mm:ss.xx] [mm:ss.xx] line 1 word 1 [mm:ss.xx] line 1 word 2 [mm:ss.xx] ... line 1 last word [mm:ss.xx]

Also, did you have idea for handling those cases?
And will you willing to make the PR for that because currently I'm focusing on another project😢
Might have no enough time to solve this issue.

https://suwa.pupu.jp/RhythmicaLyrics.html
This is the software I made the .lrc file before.

Shadowghost · 2024-05-03T14:41:53Z

I already took a look at the codebase and it likely requires some API changes. I'll try to come up with a PR over the weekend.

andy840119 · 2024-05-03T15:20:03Z

I already took a look at the codebase and it likely requires some API changes. I'll try to come up with a PR over the weekend.

I think in this case it should not repeat:

[mm:ss.xx][mm:ss.xx] line 1 word 1 [mm:ss.xx] line 1 word 2 [mm:ss.xx] ... line 1 last word [mm:ss.xx]

Did you have any idea for handle this case?
Or can you describe how will you do for this case in order to reduce the review time?

Api change is OK for me, but would be better to explain why 👌🏼

Shadowghost · 2024-05-09T10:41:24Z

This is taking abit longer than expected, but what I did up until now:

Move the timestamp into TextIndex since the current structure does not allow multipletimestamps for the same starting index (we are using a dict)
Add handling for multi occurrence.

Looking at your post above I've got a question:
According to wikipedia karaoke format is like this (duration infront of word)

[mm:ss.xx] <mm:ss.xx> line 1 word 1 <mm:ss.xx> line 1 word 2 <mm:ss.xx> ... line 1 last word

But according to your example it can also be this:

[mm:ss.xx] <mm:ss.xx> line 1 word 1 <mm:ss.xx> line 1 word 2 <mm:ss.xx> ... line 1 last word <mm:ss.xx>

What are the timestamps in this case for?

Chaphasilor · 2024-05-09T13:52:40Z

Just wanted to chime it that I had a user whose files contained lines in the format of [ms:ss.xxx]text, so with three decimals for milliseconds and no space between timestamp and text. They were not parsed correctly with Jellyfin 10.9, and getting rid of the third decimal place fixed the issue.
I'm guessing this is also an issue with this library, and could easily be fixed as part of these improvements :)

andy840119 · 2024-05-09T17:07:44Z

https://github.com/karaoke-dev/sample-beatmap/blob/master/v1/%E6%9E%AF%E3%82%8C%E3%81%AA%E3%81%84%E8%8A%B1/lyric.lrc
@Shadowghost this is the sample .lrc created by RhythmicaLyrics, a tool to create karaoke format .lrc file.

I' not sure if official .lrc format allows [mm:ss.xx] at the end of the text, but you can see the japanese karoake song and last time at the end of the lyric might means it's the time blue area covers the whole lyric.

Shadowghost · 2024-05-09T18:22:45Z

@Chaphasilor already took care of that by extending the regex.

@andy840119 seems like we need to differentiate between these layouts:

Core LRC: Single timestamp at the beginning

[mm:ss.xx] text

Core LRC: Multiple timestamps at the beginning (resulting in multiple appearances of the line)

[mm:ss.xx][mm:ss.xx][mm:ss.xx] text

Enhanced LRC (A2): Single timestamp at the beginning, followed by others denoted by <...> - within them is the DURATION, therefore NO need for a timestamp at the end

[mm:ss.xx] <mm:ss.xx> line 1 word 1 <mm:ss.xx> line 1 word 2 <mm:ss.xx> ... line 1 last word

Enhanced LRC (RythmicaLyrics): Single timestamp at the beginning, followed by others denoted by [...] - within them are ABSOLUTE timestamps, therfore we need an end timestamp to know where the displaying of the last character ends

[mm:ss.xx]line 1 word 1 [mm:ss.xx] line 1 word 2 [mm:ss.xx] ... line 1 last word [mm:ss.xx]

My plan is to convert all of these into the following object:

TextIndex:
    public int Index { get; }
    public int? Time { get; }
    public IndexState State { get; }

IndexState:
    Start
    End

Always using the full timestamp for time and IndexState Start, for cases 1 and 2 we have no entry with IndexState End, for cases 3 and 4 we have

Shadowghost · 2024-05-09T18:28:23Z

Logic for knowing which of the above we are dealing with (based on regexing one line a time):

Regex only returns one result -> case 1
Regex returns multiple results and their concatenated length is the same as the index of the end of the last match -> case 2
Regex returns multiple results and the above does not apply -> case 4
Case 1 applies and we check the resulting text for <> timestamps -> case 3

andy840119 · 2024-05-11T15:04:29Z

@Shadowghost

Finally I got some times to check what's happening in the RythmicaLyrics
This tool have two way to generate the file:

save as(A)
export(O)

If press the save button, it can support save as different type of format, but the contents are the same:

[mm:ss.xx]line 1 word 1 [mm:ss.xx] line 1 word 2 [mm:ss.xx] ... line 1 last word [mm:ss.xx]

And the save format match to the export as text2ass with .kar extension.

The conclusion:

RythmicaLyrics does not actually support export the .lrc format，they only support another file format (.kar)
I think should be better to rename the origin LrcParser into KarParser, and make new parser for the Core LRC and Enhanced LRC (A2)

andy840119 · 2024-05-11T15:08:34Z

Maybe it's a good start:

Create a first PR to create the KarParser(just copy the conten/tests in the LrcParser)
Create another PR to adjust the LrcParser, let it fully matched the Core LRC and Enhanced LRC (A2) spec.

Shadowghost · 2024-05-11T17:12:33Z

Fine for me, will do so.
One last question: is the ruby tag stuff also KOR specific?

andy840119 · 2024-05-11T17:22:51Z

There's not much documentation about .kar file format and maybe it's not as popular as .lrc file

I think is's OK to treat it as official spec
And ruby tag might not supported in the .lrc file

Because we need to adjust the lrc file format see: karaoke-dev#40

Reapor-Yurnero · 2024-07-14T21:32:31Z

Maybe it's a good start:

Create a first PR to create the KarParser(just copy the conten/tests in the LrcParser)

Create another PR to adjust the LrcParser, let it fully matched the Core LRC and Enhanced LRC (A2) spec.

Since quite a few other projects are relying on this project to parse LRC (as its name indicates), it makes sense to maintain the functionality of LRC parsing in this lib and move the kar part to a seperate one like .

BTW I was not able to find a PR on this created by @Shadowghost and wondering what's the current status of it. A service I'm using is relying on the above discussed items to be incorporated and would love to help!

andy840119 · 2024-07-15T02:23:02Z

hmmmm....
Guess i need to reserve some time on this library.

@Shadowghost are you still interested on fixing this issue? I'll take over this issue if you have no time to fix it ;_;

Shadowghost · 2024-07-15T05:41:26Z

Sorry, I've been busy over the last weeks, I can push what I have if you want to take over.

andy840119 · 2024-07-15T05:44:54Z

Sorry, I've been busy over the last weeks, I can push what I have if you want to take over.

Nice, you can give me the branch and i can cherry-pick if possible to use 🙏🏼

Reapor-Yurnero · 2024-07-15T19:14:43Z

Great! Being able to handle A2 extensions (word by word timestamps) and support millesecond timestamps [mm:ss.xxx] or <mm:ss.xxx> in A2 (as mentioned also by @Chaphasilor) is the most desired feature from the downstream projects. Appreciate your help.

Shadowghost · 2024-07-15T21:03:34Z

@andy840119 there you go: https://github.com/Shadowghost/LrcParser/tree/extend
Sadly it is very WIP, lmk if you have questions.

Because we need to adjust the lrc file format see: karaoke-dev#40

andy840119 · 2024-07-28T09:17:49Z

@1hitsong @Shadowghost @Reapor-Yurnero @Chaphasilor

Here's the new release follow the .lrc format in the wiki 🎉
https://www.nuget.org/packages/LrcParser

Only few changes and should be easy to adjust
https://github.com/karaoke-dev/LrcParser/blob/main/CHANGELOG.md

It's time to give it a try, report feedback is welcome ❤️

Shadowghost · 2024-07-28T13:13:28Z

@andy840119 thanks for the update, working fine to far on my local tests. I do have one question though: What was the reason for you to not re-use the TimeTags for multiple occurrences?

andy840119 · 2024-07-28T13:16:43Z

What was the reason for you to not re-use the TimeTags for multiple occurrences?

What does that means?
Using TimeTag class in the LrcLyric
Or Allow have multiple time-tags between lyric text?

Shadowghost · 2024-07-28T13:20:02Z

I think I misunderstood what TimeTags is for.
Why I'm mentioning it is this: Before it was possible to get LRC metadata tags like these:

[ti:Fortnight]
[ar:Taylor Swift/Post Malone]
[al:THE TORTURED POETS DEPARTMENT (Explicit)]
[offset:0]

by decoding the LRC (result would have all lines in the Lyrics object), ckecking the lines for TimeTags.Count == 0 and then parsing their text content. Now these lines are completely omitted and TimeTags.Count is always 0 if the LRC has timestamps. Only way to get them would be to parse the full LRC text.

We don't actually need them but it would be nice for completeness to have them available somewhere

andy840119 · 2024-07-28T13:29:10Z

[ti:Fortnight]
[ar:Taylor Swift/Post Malone]
[al:THE TORTURED POETS DEPARTMENT (Explicit)]
[offset:0]

That's called metadata, and i guess should make another parser for it (e.g. LrcMatadataParser in the LrcParser/Parser/Lrc/Lines/)

ckecking the lines for TimeTags.Count == 0 and then parsing their text content.

I'm not sure what it's means (sorry for my bad english understanding), but TimeTags in the Lyric.cs still needed if lyric contains the A2 extension formant

e.g.

[00:13.34] <00:14.32> Don't <00:14.73> you <00:15.14> want <00:15.57> somebody <00:16.09> to <00:16.46> love

Lyric's start time will be [00:13.34]
And Lyric's first time-tag will be [00:13.34] + <00:14.32> in 2024.728.1

[update]
Wait, i found a bug in this version.
<mm:ss.xx> seems should be absolute time, not relative time Orz.

[update]
https://www.nuget.org/packages/LrcParser New version released (2024.728.2)

Shadowghost · 2024-07-28T13:33:00Z

I'm not sure what it's means (sorry for my bad english understanding), but TimeTags in the Lyric.cs still needed if lyric contains the A2 extension formant

Yes, that's what I mean with I misunderstood their usage 😅

That's called metadata, and i guess should make another parser for it (e.g. LrcMatadataParser in the LrcParser/Parser/Lrc/Lines/)

Alternatively you could parse it in the existing parser and add a Metadata object to the parser result in addition to Lyrics - would preent us from reading the file two times if we want both.

Reapor-Yurnero · 2024-08-01T23:41:57Z

Personally I don't think it's very necessary to parse metadata in the lyrics. We can show them in the lyrics text but we don't need to parse it. Metadata should in the end be embedded in the music file itself rather than being double embedded in the lyrics.

andy840119 · 2024-08-02T02:42:51Z

Personally I don't think it's very necessary to parse metadata in the lyrics. We can show them in the lyrics text but we don't need to parse it.

I think the next step might let LrcParser able to prese metadata, for prevent treating metadata as lyric text.
As your mention, song info is able to get from the music file itself, so there's no need to show in the lyric again.

Reapor-Yurnero · 2024-08-02T03:48:35Z

In fact, i think it does have its merit to have some metadata shown in the lyrics such as composer, lyrics writer etc. which are typically not part of the music metadata itself.

We may just skip these none timestamp brackets and do not parse them at all.

Shadowghost · 2024-08-02T09:44:39Z

Personally I don't think it's very necessary to parse metadata in the lyrics. We can show them in the lyrics text but we don't need to parse it. Metadata should in the end be embedded in the music file itself rather than being double embedded in the lyrics.

As your mention, song info is able to get from the music file itself, so there's no need to show in the lyric again.

Usually lyrics are not shared with the associated media file due to copyrights. Adding the metadata to the music file is therefore not really a valid argument.
I'd even argue that lyrics without such metadata are kind of useless if different edits of a song exist.

We may just skip these none timestamp brackets and do not parse them at all.

That is already the case for synced lyrics.

andy840119 mentioned this issue Mar 8, 2023

Parse fails when files have multiple timestamps for single line of text #41

Merged

andy840119 added the bug Something isn't working label Mar 8, 2023

andy840119 closed this as completed Oct 17, 2023

andy840119 reopened this May 3, 2024

andy840119 mentioned this issue May 12, 2024

Replace all the .lrc file support to .kar karaoke-dev/karaoke#2231

Closed

andy840119 added a commit to andy840119/LrcParser that referenced this issue Jun 10, 2024

Copy the kar file format from the lrc folder.

d60b0af

Because we need to adjust the lrc file format see: karaoke-dev#40

Chaphasilor mentioned this issue Jun 15, 2024

Support word-by-word time synced lyrics jmshrv/finamp#780

Closed

congerh mentioned this issue Jun 26, 2024

[Issue]: Lyrics with timestamps in 3 decimal places can't be recognized as synced jellyfin/jellyfin#11518

Closed

1 task

andy840119 added a commit to andy840119/LrcParser that referenced this issue Jul 21, 2024

Copy the kar file format from the lrc folder.

2d2e1ea

Because we need to adjust the lrc file format see: karaoke-dev#40

This was referenced Jul 21, 2024

Copy the kar file format from the lrc folder. #64

Merged

Remove the ruby parsing logic in the lrc parser. #65

Merged

Refactor the line test case. #67

Merged

andy840119 mentioned this issue Jul 28, 2024

Rewrite the lrc parser logic. #69

Merged

andy840119 mentioned this issue Jul 28, 2024

"<mm:dd.ss>" in the lyric store the absolute time, not the relative time. #72

Merged

Parse fails when files have multiple timestamps for single line of text #40

Parse fails when files have multiple timestamps for single line of text #40

Comments

1hitsong commented Mar 7, 2023

andy840119 commented Mar 8, 2023 • edited Loading

1hitsong commented Mar 8, 2023

andy840119 commented Mar 8, 2023

andy840119 commented Mar 8, 2023

1hitsong commented Mar 8, 2023

andy840119 commented Oct 17, 2023

Shadowghost commented May 3, 2024 • edited Loading

andy840119 commented May 3, 2024 • edited Loading

Shadowghost commented May 3, 2024

andy840119 commented May 3, 2024 • edited Loading

Shadowghost commented May 9, 2024

Chaphasilor commented May 9, 2024

andy840119 commented May 9, 2024 • edited Loading

Shadowghost commented May 9, 2024

Shadowghost commented May 9, 2024 • edited Loading

andy840119 commented May 11, 2024

andy840119 commented May 11, 2024

Shadowghost commented May 11, 2024

andy840119 commented May 11, 2024

Reapor-Yurnero commented Jul 14, 2024

andy840119 commented Jul 15, 2024

Shadowghost commented Jul 15, 2024

andy840119 commented Jul 15, 2024

Reapor-Yurnero commented Jul 15, 2024 • edited Loading

Shadowghost commented Jul 15, 2024

andy840119 commented Jul 28, 2024

Shadowghost commented Jul 28, 2024

andy840119 commented Jul 28, 2024

Shadowghost commented Jul 28, 2024 • edited Loading

andy840119 commented Jul 28, 2024 • edited Loading

Shadowghost commented Jul 28, 2024

Reapor-Yurnero commented Aug 1, 2024

andy840119 commented Aug 2, 2024

Reapor-Yurnero commented Aug 2, 2024

Shadowghost commented Aug 2, 2024

andy840119 commented Mar 8, 2023 •

edited

Loading

Shadowghost commented May 3, 2024 •

edited

Loading

andy840119 commented May 3, 2024 •

edited

Loading

andy840119 commented May 3, 2024 •

edited

Loading

andy840119 commented May 9, 2024 •

edited

Loading

Shadowghost commented May 9, 2024 •

edited

Loading

Reapor-Yurnero commented Jul 15, 2024 •

edited

Loading

Shadowghost commented Jul 28, 2024 •

edited

Loading

andy840119 commented Jul 28, 2024 •

edited

Loading