Codes and data for the paper "Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed Counterfactuals".
This repo contains:
- the prompts used to instruct LLMs, including the prompts used to generate the counterfactual samples in directory
prompts/
, the prompts used to score the essays and the prompts used to generate feedback. Specifically:system
: system messagescf_gen
: user messages used to generate counterfactual samplesscoring
: user messages used to score the essays (both the original and the counterfactual samples)feedback
: user messages used to generate feedback
- the test set essays and corresponding counterfactual essays for both the TOEFL11 and ELLIPSE datasets in directory
data/
, specifically, counterfactual samples are stored in sub-directorydata/${DATASET_NAME}/cfact
. - few-shot examples for both the TOEFL11 and ELLIPSE datasets in directory
data/
. Both files are calledmedoids_dict.json
- Python and shell scripts to control the whole experimental process:
- for detail of counterfactual generation, please refer to sub-directory
cf_gen_exp/
; - for detail of scoring, please refer to sub-directory
scoring_exp/
; - for detail of feedback generation, please refer to sub-directory
feedback_exp/
.
- for detail of counterfactual generation, please refer to sub-directory