ComFact

This is the source code for paper ComFact: A Benchmark for Linking Contextual Commonsense Knowledge.

Getting Started

Start with creating a python 3.6 venv and installing requirements.txt.

ComFact Datasets

Our ComFact dataset can be downloaded from this link, please place data/ under this root directory.

Pretrained Glove embeddings can be downloaded from this link, please place glove/ under the data/ directory and unzip glove.6B.zip in it.

Data portions:

Persona-Atomic data portion: persona/
Mutual-Atomic data portion: mutual/
Roc-Atomic data portion: roc/
Movie-Atomic data portion: movie/

Data Preprocessing

python data_preprocessing_main.py

Training and Evaluation

Prepare directory:

mkdir pred
mkdir runs

Training:

bash train_baseline.sh

Parameters:

language model ${lm}: "deberta-large" | "deberta-base" | "roberta-large" | "roberta-base" | "bert-large" | "bert-base" | "distilbert-base" | "lstm"
data portion ${portion}: "persona" | "mutual" | "roc" | "movie" | "all" (training on the union of all four data portions)
context window ${window}: "nlg" (half window without future context) | "nlu" (full context window)
linking task ${task}: "fact_full" (direct setting) | "head" (head entity linking, sub-task in pipeline setting) | "fact_cut" (fact linking of relevant head entities, sub-task in pipeline setting)
evaluation set ${eval_set}: "val" (validation set) | "test" (testing set)

Evaluating direct setting or sub-tasks in pipeline setting:

bash run_baseline.sh

parameters refer to Training.

Fine-grained analysis on fact linking results (after evaluating by run_baseline.sh):

python evaluate_linking.py --model ${lm} --window ${window} --portion ${portion} --linking ${task}

parameters refer to Training, ${task} should be fact_full | fact_cut

Evaluating full pipeline setting:

bash run_baseline_pipeline.sh

parameters refer to Training.

Evaluating head entity linkers in fact linking:

bash run_baseline_head_linker.sh

parameters refer to Training.

Cross evaluation:

bash cross_evaluation.sh

Parameters:

source data portion providing training set ${source_portion}: "persona" | "mutual" | "roc" | "movie" | "all"
target data portion providing validation or testing set ${target_portion}: "persona" | "mutual" | "roc" | "movie" | "all"

others refer to Training.

Plot heatmap for cross evaluation (lm: roberta-large, window: nlg, task: fact_full):

python plot_cross_evaluation.py

Downstream Dialogue Response Generation (CEM)

Setup NLG evaluation toolkits

pip install git+https://github.com/Maluuba/nlg-eval.git@master
nlg-eval --setup

Download CEM data from this link and place data/ under CEM/ directory.

Original preprocessed CEM data: ED/dataset_preproc.p

We also include our preprocessed CEM data with ComFact refined knowledge: ED/dataset_preproc_link.p

Prepare directory:

mkdir CEM/saved
mkdir CEM/vectors

Copy glove.6B.zip from data/glove/ to CEM/vectors/ directory.

Knowledge Refinement (For producing dataset_preproc_link.p, Optional)

Training fact linker for CEM knowledge refinement:

python preprocessing_rel_tail_link_x.py
bash train_baseline_rel_tail_link_x.sh

Extracting CEM data and preprocessing for knowledge refinement:

python cem_data_extract.py
python preprocess_cem_link.py

The extracted data will be placed in data/cem/rel_tail/nlg/test/${split}_data.json, where ${split}: "train" | "val" | "test"

Knowledge refining by fact linker, i.e., labeling the relevance of knowledge in the extracted CEM data:

bash run_baseline_cem_link_x.sh
python label_cem.py

Write back to the CEM data form:

python cem_data_back.py

Dialogue Modeling

Switch to the CEM folder:

cd CEM

Training CEM dialogue model:

python main.py --model cem --dataset ${dataset} --save_path ${save} --model_path ${save} --cuda

Parameters:

data source ${dataset}: dataset_preproc.p (original CEM dataset) | dataset_preproc_link.p (CEM dataset with ComFact refined knowledge),
${save}: your directory for saving the model and results.

Testing CEM dialogue model:

python main.py --test --model cem --dataset ${dataset} --save_path ${save} --model_path ${save} --cuda

NLG Evaluation:

Move the obtained results.txt from your result saving directory (${save}) to results/ directory, rename the file to ${name}.txt, then run:

python src/scripts/evaluate.py --results ${name}

Parameters:

${name}: name of the results file, e.g., CEM_link

We include the dialogue generation results: CEM_ori.txt (from original CEM) and CEM_link.txt (from CEM trained with ComFact refined knowledge) under results/ directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ComFact

Getting Started

ComFact Datasets

Data Preprocessing

Training and Evaluation

Downstream Dialogue Response Generation (CEM)

Knowledge Refinement (For producing dataset_preproc_link.p, Optional)

Dialogue Modeling

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
CEM		CEM
baseline		baseline
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
baseline.py		baseline.py
cem_data_back.py		cem_data_back.py
cem_data_extract.py		cem_data_extract.py
cross_evaluation.sh		cross_evaluation.sh
data_preprocessing_main.py		data_preprocessing_main.py
data_preprocessing_pipeline_test.py		data_preprocessing_pipeline_test.py
evaluate_linking.py		evaluate_linking.py
evaluate_pipeline.py		evaluate_pipeline.py
label_cem.py		label_cem.py
plot_cross_evaluation.py		plot_cross_evaluation.py
preprocessing_cem_link.py		preprocessing_cem_link.py
preprocessing_rel_tail_link_x.py		preprocessing_rel_tail_link_x.py
requirements.txt		requirements.txt
run_baseline.sh		run_baseline.sh
run_baseline_cem_link_x.sh		run_baseline_cem_link_x.sh
run_baseline_head_linker.sh		run_baseline_head_linker.sh
run_baseline_pipeline.sh		run_baseline_pipeline.sh
train_baseline.sh		train_baseline.sh
train_baseline_rel_tail_link_x.sh		train_baseline_rel_tail_link_x.sh

License

epfl-nlp/ComFact

Folders and files

Latest commit

History

Repository files navigation

ComFact

Getting Started

ComFact Datasets

Data Preprocessing

Training and Evaluation

Downstream Dialogue Response Generation (CEM)

Knowledge Refinement (For producing dataset_preproc_link.p, Optional)

Dialogue Modeling

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages