-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pair classification inconsistencies #582
Labels
Comments
10 tasks
It's not an error but this is a legacy of how the initial pair classification datasets were formatted, for example TwitterSemEval2015: >>> d = load_dataset('mteb/twittersemeval2015-pairclassification')
Downloading data: 100%|██████████████████████████████████████████████████████████████████████████| 313k/313k [00:00<00:00, 1.35MB/s]
Generating test split: 100%|███████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 7.39 examples/s]
>>> d
DatasetDict({
test: Dataset({
features: ['sent1', 'sent2', 'labels'],
num_rows: 1
})
}) There is a single row, where each row contains a list of sentences. I agree this isn't a very good format and the naming is inconsistent with other tasks so it might make sense to change it |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am working #581 dataset, I found couple of potential issues in 'AbsTaskPairClassification' with
data_split["sent1"], data_split["sent2"], data_split["labels"], **kwargs
)
also it expects 'sent1', 'sent2', 'labels' instead of 'sentence1', 'sentence2' and 'label' (standard followed in STS and BiText Mining task)
The text was updated successfully, but these errors were encountered: