# May want to check the file "Slides_Detection of Sensational Language Features in News Headlines" as well.
This research pioneers the application of machine learning and natural language processing (NLP) to identify sensational language in news headlines using a self-created dataset, MIRUKU. It highlights sensational language discussions and identifies key features through SHAP analysis.
Sensational Language: Expressions that quickly evoke emotional arousal and interest through vivid, exaggerated, dramatic wording or shocking content, appealing to curiosity, emotions, or biases.
News headlines act as relevance optimizers to capture readers’ attention and influence decisions to engage with content. Modern news strategies focus on increasing Click-Through Rates (CTR), with sensational headlines being a key tactic.
Language plays a crucial role in communication, promoting emotional engagement. Sensationalism triggers emotional and psychological responses, helping to retain attention.
Sensational language emphasizes emotional and physiological responses, aligning with human evolutionary traits of attention to threats or survival-related information.
Human evolution has shaped an inherent tendency to respond to sensational stimuli for survival and reproduction. This explains why sensational news effectively draws attention.
Most research on sensationalism focuses on corpus analysis rather than NLP methods.
NLP combined with machine learning offers an efficient way to identify sensational language and analyze its contributing elements.
News headlines often blend multiple categories, requiring a stylistic and analytical approach to identify sensationalism. The study aims to analyze linguistic features in sensational news.
- Can a single linguistic feature effectively and stably identify sensational language?
- Which algorithm performs best in detecting sensational language?
- What features are most effective in identifying sensational language?
The study explores multiple linguistic features, including:
- Stop word ratios
- TF-IDF (with and without stop words)
- Syntactic n-grams
- Subjectivity and sentiment analysis
- Readability scores
- Elongated words
- Punctuation marks (e.g., exclamation marks, ellipses)
- Named Entity Recognition (NER)
- Part-of-Speech (POS) tags
- Clustering algorithms
- Reproducibility: Choices of methods, such as Cuckoo Search for feature extraction and Random Search for training, may affect reproducibility.
- Threshold Adjustment: Adjusted thresholds were not tested in follow-up experiments.