lex-simple

Repo for the paper Unsupervised Simplification of Legal Texts https://arxiv.org/pdf/2209.00557

Dataset

We have gathered a new dataset for the goal of legal text simplification. To that aim, we have selected 1000 random legal sentences from the CaseLaw Access project of Harward Law School. Then, by collaborating with the faculty and the students of Bilkent Law School, we produced 3 different simplified reference files for these 1000 sentences. We hope that this dataset can serve as a benchmark for future legal text simplification studies.

Code

In order to run the algorithm proposed in the paper, run the following command. Python 3.6 or above is required. In particular, run:

conda create -n uslt python=3.10
conda activate uslt
git clone https://github.com/koc-lab/lex-simple.git
cd lex-simple
pip install -r requirements.txt
python -m spacy download en_core_web_sm
python -m spacy download en
cd scripts
python run_uslt.py

After running the code above, you will generate a .txt file with lexical simplifications. In order to do structural simplification on top of lexical simplification, follow the steps in https://github.com/Lambda-3/DiscourseSimplification/tree/master. In particular, run

cd .. #make sure you are in the main directory
git clone https://github.com/koc-lab/SentenceSplitting.git
cd DiscourseSimplification
mvn clean install -DskipTests

First, create a directory under DiscourseSimplification at edu/stanford/nlp/models/pos-tagger/english-left3words, and move the stanford nlp taggers you may find in this drive link inside these folders: https://drive.google.com/drive/folders/1GQerFiPgzFnS2lawIfAz8C_NsLbdQUJG?usp=share_link Then, generate an empty file called 'input.txt' inside this directory and copy and paste the lexically simplified document generated by the run_uslt.py code. Then, run

mvn clean compile exec:java
cd ..
python decode_sentence_splitting.py

Now you generated the final txt file!

Evaluation

You need to install easse, for which please follow the guides in https://github.com/feralvam/easse

After gathering the text outputs, run

python eval.py

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
data		data
files		files
raw_data		raw_data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
decode_sentence_splitting.py		decode_sentence_splitting.py
eval.py		eval.py
original_sentences.txt		original_sentences.txt
reference_file_1.txt		reference_file_1.txt
reference_file_2.txt		reference_file_2.txt
reference_file_3.txt		reference_file_3.txt
requirements.txt		requirements.txt
supreme_org_val.txt		supreme_org_val.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lex-simple

Dataset

Code

Evaluation

About

Releases

Packages

Languages

License

koc-lab/lex-simple

Folders and files

Latest commit

History

Repository files navigation

lex-simple

Dataset

Code

Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages