Harassment-Corpus

Publishing a Quality Context-aware Annotated Corpus andLexicon for Harassment Research.

Identifying profane or offensive words are a standard way of starting the investigation over cyberbullying incident. For this reason, initially we created a lexicon form the profane words and we divided our dictionary into the six context;1) Sexual 2) Appearance-related 3) Intellectual 4) Political 5) Racial 6) Combined. We utilized the first five categories of our lexiconas seed terms for collecting tweets from Twitter. Using at least one offensive word,we collected 10,000 tweets for each contextual type for a total of 50,000. Using offensive words in a given tweet does not assure that thetweet is harassing because individuals might utilize the offensivewords in a friendly manner or quotes. Therefore, we rely on human judged annotations for discriminating harassing tweets fromnot-harassing tweets. We acknowledge support from the National Science Foundation (NSF) award CNS 1513721: Context-Aware Harassment Detection on Social Media. Wiki page of this project: http://wiki.knoesis.org/index.php/Context-Aware_Harassment_Detection_on_Social_Media To getting our annotated tweets in five context, please contact the authors via these emails: Mohammadreza Rezvan: [email protected] Saeedeh Shekarpour: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Harassment Lexicon.csv		Harassment Lexicon.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harassment-Corpus

About

Releases

Packages

Mrezvan94/Harassment-Corpus

Folders and files

Latest commit

History

Repository files navigation

Harassment-Corpus

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages