Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the data used when training Hubert #5568

Open
duduke37 opened this issue Nov 30, 2024 · 0 comments
Open

Questions about the data used when training Hubert #5568

duduke37 opened this issue Nov 30, 2024 · 0 comments

Comments

@duduke37
Copy link

When I start training using examples/hubert/README.md, first perform data preparation. Then it is said that the format of {train,valid}.tsv is.:

<root-dir>
<audio-path-1>
<audio-path-2>
...

In my understanding, that is.
image
But in the process of handling and training, I found that the.tsv file seems to have a value in addition to the path, such as "sz" in the following code. :
def load_audio(manifest_path, max_keep, min_keep):
n_long, n_short = 0, 0
names, inds, sizes = [], [], []
with open(manifest_path) as f:
root = f.readline().strip()
for ind, line in enumerate(f):
items = line.strip().split("\t")
assert len(items) == 2, line
sz = int(items[1])
if min_keep is not None and sz < min_keep:
n_short += 1
elif max_keep is not None and sz > max_keep:
n_long += 1
else:
names.append(items[0])
inds.append(ind)
sizes.append(sz)
tot = ind + 1
logger.info(
(
f"max_keep={max_keep}, min_keep={min_keep}, "
f"loaded {len(names)}, skipped {n_short} short and {n_long} long, "
f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
)
)
return root, names, inds, tot, sizes
So, I want to know what exactly the format of this data is. The first item is the path. What is the second item and how can it be obtained?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant