Quality of adversaries and authenticity of results #24

SachJbp · 2020-06-20T20:00:55Z

There seems to be a issue in a few adversaries.

For example: A claimed adversary from mr_bert.txt is:
orig sent (0): to portray modern women the way director davis has done is just unthinkable
adv sent (1): to portray modern women the way director davis has done is just imaginable

unthinkable and imaginable are antonyms which erroneously have high cosine similarity suggesting that those are synonyms. I suggest such examples should not be considered while evaluating the success rate of attack, as the human evaluation would clearly label it as positive (1) and not negative.

jind11 · 2020-06-20T20:02:31Z

Yes, the human evaluation on polarity is not 100% due to these errors.

SachJbp · 2020-06-20T20:09:05Z

The ~13% after-attack accuracy reported considers such examples as success , which actually is not. I guess Human evaluation filter should finally govern the after-attack accuracy. Please correct me if I am wrong. Thanks.

jind11 · 2020-06-20T20:14:54Z

Human evaluation can check whether these "successful" examples are legitimate or not.

Youoo1 · 2021-10-20T14:37:08Z

Where is the emdding.npz file, please? Or how is it generated?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quality of adversaries and authenticity of results #24

Quality of adversaries and authenticity of results #24

SachJbp commented Jun 20, 2020

jind11 commented Jun 20, 2020

SachJbp commented Jun 20, 2020 •

edited

Loading

jind11 commented Jun 20, 2020 via email •

edited

Loading

Youoo1 commented Oct 20, 2021

Quality of adversaries and authenticity of results #24

Quality of adversaries and authenticity of results #24

Comments

SachJbp commented Jun 20, 2020

jind11 commented Jun 20, 2020

SachJbp commented Jun 20, 2020 • edited Loading

jind11 commented Jun 20, 2020 via email • edited Loading

Youoo1 commented Oct 20, 2021

SachJbp commented Jun 20, 2020 •

edited

Loading

jind11 commented Jun 20, 2020 via email •

edited

Loading