-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datasets create empty samples #7
Comments
Hi Juan Carlos, Can you please provide the version of musicaiz that you are using? |
Sure! I installed it using:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Carlos,
I created a dataset using Musicaiz. I used the provided code:
When reviewing the dataset, I noticed that there were some empty lines:
You can also check it out in Hugging Face.
I am unsure about why this is happening. I could not install the repo locally now to debug it, but maybe it has to do with the mmm tokenizer? Line 174:
tokens += "\n"
.I am happy to create a PR if I find the problem. But I wanted to create the issue first 😃
Thanks again for the great library.
The text was updated successfully, but these errors were encountered: