Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date normalization #5

Open
c-arvind opened this issue Sep 15, 2023 · 1 comment
Open

Date normalization #5

c-arvind opened this issue Sep 15, 2023 · 1 comment

Comments

@c-arvind
Copy link

c-arvind commented Sep 15, 2023

Hi I tried using your normalizer for help in calculating WER for my personal use case.

I have a ground truth like so:
JUNE THIRD EIGHTEEN SEVENTY ONE OBOCOCK BROTHERS BANK AT CORYDON IOWA WAS ROBBED OF FORTY THOUSAND DOLLARS BY SEVEN MEN IN BROAD DAYLIGHT
and my predicted text (using whisper base.en) like so:
June 3, 1871, Hobacock Brothers Bank at Croix d'Inneil was robbed of $40,000 by seven men in broad daylight.

The issue is that if i normalize the ground truth then it converts $40k to the numeric amount as well as '7 men' however the first date is printed as 'June 3rd' and if i normalize the predicted text then the numbers stay in numeric form

@kurianbenoy
Copy link
Owner

Thanks for reporting this issue @c-arvind. From what I understand the issue is that:

Instead of June 3rd you would want the output as June third

Let me see if we can do anything to normalize dates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants