Simple alignments breakdown near end of audio: please help. #295

davidbernat · 2022-12-24T12:33:10Z

I am a beginner user of aeneas (MacBook 2021 Ventura 13.0.1) with a large amount of experience in natural language processing, audio, algorithms, and software. I understand the basic principals of aeneas and forced alignment algorithms.

I recently noticed that my configuration 'runs out of room' and the alignment begins to produce errors of the same type.

Can someone familiar with the aeneas package help me debug this? I will provide more clear code as we discuss.

Here is the basic outline of my usage:

            phrases = [m["text_during"] for m in continuous[i]]
            audio = MoviePyUtilities.concat_clips_as_list([AudioFileClip(a["filename"]) for a in grouped_audio[i]], composite=True)
            tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=True)
            audio.write_audiofile(tmp.name, codec="pcm_s32le", fps=MoviePyUtilities.get_fps(audio))
            forced = ForcedAlignment.force_alignment(phrases, tmp.name)
            if forced is None: raise RuntimeError(f"Error occurred during ForcedAlignment for continuous index {i}")

Nothing particularly unique in the above: I have a collection of phrases, each about one sentence long, and I have associated audio. I write the audio to a temporary file, and inside the forced_alignment function I will write the phrases to disk.

            text_file = tempfile.NamedTemporaryFile(delete=True)
            json_file = tempfile.NamedTemporaryFile(delete=True)
            with open(text_file.name, "w") as f:
                f.write("\n".join(shortened))
            args = ["aeneas", audio_filename, text_file.name,
                    "task_language=eng|os_task_file_format=json|is_text_type=plain", json_file.name]
            e = ExecuteTaskCLI()
            e.use_sys = False
            code = e.run(arguments=args, show_help=False)
            if code != 0: raise RuntimeError()
            with open(json_file.name) as f:
                results = json.load(f)

Here I execute the aeneas package using the configuration shown above. Typical results are published below. I have also tried varying the length of the phrases and the same problem persists.

You can see that the alignment for the first three phrases is roughly correct, and the fourth phrase is essentially provided zero length. This is wrong. It almost appears as though the tempo of the alignment is wrong: in other words, the proportion of the first three phrases is correct, but each 'too long,' and then aeneas simply runs out of length of the audio file.

This package is very important, and its algorithm and implementation is very streamline and an excellent baseline for many more sophisticated audio applications.

Can we debug?

The text was updated successfully, but these errors were encountered:

changyr66 · 2023-04-07T15:01:48Z

I encountered the same issue. Did you figure out the reasons and solutions?

Oleg-A-LLIto · 2023-07-09T16:38:58Z

Same problem here, seems weird to me how the errors accumulate instead of each longer part just chipping off the start of the next one. After all, the start and finish time are the most important and what the thing should analyze, not the duration

davidbernat · 2023-07-09T16:56:25Z

@Oleg-A-LLIto @changyr66 Can you post your code and data file examples?
The feature of aeneas is that the underlying technology is simple bigram matched filters.
It should be robust. Or at least straightforward to diagnose. Though I believe the binary is pre-compiled?

Oleg-A-LLIto · 2023-07-09T17:43:36Z

Sure, here's an example. Unfortunately, I had to change the json to txt and mp3 to mp4 (github likes it that way).

TextInitial.txt
TextMarked.txt
https://github.com/readbeyond/aeneas/assets/43452849/c7546f3d-3db2-4e28-9292-4a52ffd0f018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple alignments breakdown near end of audio: please help. #295

Simple alignments breakdown near end of audio: please help. #295

davidbernat commented Dec 24, 2022

changyr66 commented Apr 7, 2023

Oleg-A-LLIto commented Jul 9, 2023

davidbernat commented Jul 9, 2023

Oleg-A-LLIto commented Jul 9, 2023

Simple alignments breakdown near end of audio: please help. #295

Simple alignments breakdown near end of audio: please help. #295

Comments

davidbernat commented Dec 24, 2022

changyr66 commented Apr 7, 2023

Oleg-A-LLIto commented Jul 9, 2023

davidbernat commented Jul 9, 2023

Oleg-A-LLIto commented Jul 9, 2023