Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple alignments breakdown near end of audio: please help. #295

Open
davidbernat opened this issue Dec 24, 2022 · 4 comments
Open

Simple alignments breakdown near end of audio: please help. #295

davidbernat opened this issue Dec 24, 2022 · 4 comments

Comments

@davidbernat
Copy link

I am a beginner user of aeneas (MacBook 2021 Ventura 13.0.1) with a large amount of experience in natural language processing, audio, algorithms, and software. I understand the basic principals of aeneas and forced alignment algorithms.

I recently noticed that my configuration 'runs out of room' and the alignment begins to produce errors of the same type.

Can someone familiar with the aeneas package help me debug this? I will provide more clear code as we discuss.

Here is the basic outline of my usage:

            phrases = [m["text_during"] for m in continuous[i]]
            audio = MoviePyUtilities.concat_clips_as_list([AudioFileClip(a["filename"]) for a in grouped_audio[i]], composite=True)
            tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=True)
            audio.write_audiofile(tmp.name, codec="pcm_s32le", fps=MoviePyUtilities.get_fps(audio))
            forced = ForcedAlignment.force_alignment(phrases, tmp.name)
            if forced is None: raise RuntimeError(f"Error occurred during ForcedAlignment for continuous index {i}")

Nothing particularly unique in the above: I have a collection of phrases, each about one sentence long, and I have associated audio. I write the audio to a temporary file, and inside the forced_alignment function I will write the phrases to disk.

            text_file = tempfile.NamedTemporaryFile(delete=True)
            json_file = tempfile.NamedTemporaryFile(delete=True)
            with open(text_file.name, "w") as f:
                f.write("\n".join(shortened))
            args = ["aeneas", audio_filename, text_file.name,
                    "task_language=eng|os_task_file_format=json|is_text_type=plain", json_file.name]
            e = ExecuteTaskCLI()
            e.use_sys = False
            code = e.run(arguments=args, show_help=False)
            if code != 0: raise RuntimeError()
            with open(json_file.name) as f:
                results = json.load(f)

Here I execute the aeneas package using the configuration shown above. Typical results are published below. I have also tried varying the length of the phrases and the same problem persists.

Screenshot 2022-12-24 at 7 25 24 AM

You can see that the alignment for the first three phrases is roughly correct, and the fourth phrase is essentially provided zero length. This is wrong. It almost appears as though the tempo of the alignment is wrong: in other words, the proportion of the first three phrases is correct, but each 'too long,' and then aeneas simply runs out of length of the audio file.

This package is very important, and its algorithm and implementation is very streamline and an excellent baseline for many more sophisticated audio applications.

Can we debug?

@changyr66
Copy link

I encountered the same issue. Did you figure out the reasons and solutions?

@Oleg-A-LLIto
Copy link

Same problem here, seems weird to me how the errors accumulate instead of each longer part just chipping off the start of the next one. After all, the start and finish time are the most important and what the thing should analyze, not the duration

@davidbernat
Copy link
Author

@Oleg-A-LLIto @changyr66 Can you post your code and data file examples?
The feature of aeneas is that the underlying technology is simple bigram matched filters.
It should be robust. Or at least straightforward to diagnose. Though I believe the binary is pre-compiled?

@Oleg-A-LLIto
Copy link

Sure, here's an example. Unfortunately, I had to change the json to txt and mp3 to mp4 (github likes it that way).

TextInitial.txt
TextMarked.txt
https://github.com/readbeyond/aeneas/assets/43452849/c7546f3d-3db2-4e28-9292-4a52ffd0f018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants