Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datasets create empty samples #7

Open
juancopi81 opened this issue Apr 5, 2023 · 2 comments
Open

Datasets create empty samples #7

juancopi81 opened this issue Apr 5, 2023 · 2 comments

Comments

@juancopi81
Copy link

Hi Carlos,

I created a dataset using Musicaiz. I used the provided code:

from musicaiz.tokenizers import MMMTokenizer, MMMTokenizerArguments
from musicaiz.datasets import JSBChorales

# Tokenize a dataset in musicaiz
output_path = "./BachChorales_4Bar_128"

args = MMMTokenizerArguments(
    prev_tokens="",
    windowing=True,
    time_unit="HUNDRED_TWENTY_EIGHT",
    num_programs=None,
    shuffle_tracks=True,
    track_density=True,
    window_size=4,
    hop_length=2,
    time_sig=False,
    velocity=False,
)
dataset = JSBChorales()
dataset.tokenize(
    dataset_path="/path/JSBChoralesDataset",
    output_path=output_path,
    output_file="token-sequences",
    args=args,
    tokenize_split="all"
)

When reviewing the dataset, I noticed that there were some empty lines:

image

You can also check it out in Hugging Face.

I am unsure about why this is happening. I could not install the repo locally now to debug it, but maybe it has to do with the mmm tokenizer? Line 174:

tokens += "\n".

I am happy to create a PR if I find the problem. But I wanted to create the issue first 😃

Thanks again for the great library.

@carlosholivan
Copy link
Owner

Hi Juan Carlos,

Can you please provide the version of musicaiz that you are using?

@juancopi81
Copy link
Author

Sure! I installed it using:

!pip install musicaiz==0.1.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants