Questions about the data used when training Hubert #5568

duduke37 · 2024-11-30T14:06:55Z

When I start training using examples/hubert/README.md, first perform data preparation. Then it is said that the format of {train,valid}.tsv is.：

<root-dir>
<audio-path-1>
<audio-path-2>
...

In my understanding, that is.

But in the process of handling and training, I found that the.tsv file seems to have a value in addition to the path, such as "sz" in the following code. ：
def load_audio(manifest_path, max_keep, min_keep):
n_long, n_short = 0, 0
names, inds, sizes = [], [], []
with open(manifest_path) as f:
root = f.readline().strip()
for ind, line in enumerate(f):
items = line.strip().split("\t")
assert len(items) == 2, line
sz = int(items[1])
if min_keep is not None and sz < min_keep:
n_short += 1
elif max_keep is not None and sz > max_keep:
n_long += 1
else:
names.append(items[0])
inds.append(ind)
sizes.append(sz)
tot = ind + 1
logger.info(
(
f"max_keep={max_keep}, min_keep={min_keep}, "
f"loaded {len(names)}, skipped {n_short} short and {n_long} long, "
f"longest-loaded={max(sizes)}, shortest-loaded={min(sizes)}"
)
)
return root, names, inds, tot, sizes
So, I want to know what exactly the format of this data is. The first item is the path. What is the second item and how can it be obtained?

The text was updated successfully, but these errors were encountered:

duduke37 added needs triage question labels Nov 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the data used when training Hubert #5568

Questions about the data used when training Hubert #5568

duduke37 commented Nov 30, 2024

Questions about the data used when training Hubert #5568

Questions about the data used when training Hubert #5568

Comments

duduke37 commented Nov 30, 2024