Sense-aware Lexical Sophistication

This project releases the automatic analysis tool and the resources in the article:

Xiaofei Lu and Renfen Hu (2021). Sense-aware lexical sophistication indices and their relationship to second language writing quality. Behavior Research Methods.

Prerequisites

1. Install Python packages

Python 3.5+
NLTK
bert_serving

2. Download the pre-trained language model

In this study, we used the uncased BERT-Base model to generate deep contextualized word embeddings. More options can be found at https://github.com/google-research/bert.

Since BERT is a deep learning model, it is suggested to use the tool on a GPU-based device.

3. Download the sense embeddings

The sense embeddings constructed in this study (about 107M) can be download at Google Drive or BNU Cloud Storage.

Please place the file in the dict folder before running the codes.

Automatic analysis

Step 1. Start the BERT service.

bert-serving-start \
    -pooling_strategy NONE \
    -max_seq_len 128 \
    -pooling_layer -1 \
    -device_map 0 \           # please specify the GPU device ID
    -model_dir bert_base \    # please specify the directory of the pre-trained BERT model
    -show_tokens_to_client \
    -priority_batch_size 32   # batch_size is set based on GPU memory, in this study the Nvidia 1080TI (11G memory) is used.

Step 2. Tag the senses for polysemous words.

python tag_text_server.py

In this step, we firstly conduct sentence tokenization for each essay, which can be seen in the folder of samples. After that, we label the sense for each polysemous word sentence by sentence. The sense information is from the online version of Oxford dictionary.

The sense tagging results can be seen in the folder of output.

Step 3. Terminate the BERT service.

bert-serving-terminate -port 5555

Step 4. Compute the sense-aware lexical sophistication indices.

python sense_aware_indices.py

The result can be seen in indices.csv.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sense-aware Lexical Sophistication

Prerequisites

Automatic analysis

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
dict		dict
output		output
samples		samples
README.md		README.md
indices.csv		indices.csv
sense_aware_indices.py		sense_aware_indices.py
tag_sense.py		tag_sense.py
tag_text_server.py		tag_text_server.py
utils.py		utils.py

iris2hu/sense-aware-lexical-sophistication

Folders and files

Latest commit

History

Repository files navigation

Sense-aware Lexical Sophistication

Prerequisites

Automatic analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages