This code is built on top of the Author implementation of SmBOP
NOTE: This is an inital version of the code. We will release a cleaner version soon.
- Install
requirements.txt
:pip install -r requirements.txt
- Follow original SmBOP installation instructions given below (also available here)
- Download Spider dataset
- bash scripts/download_spider.sh
- Preprocess spider dataset
- python3 python3 process_spider_data.py
- Generate 3 random D_new/D_test Splits for the five dev schemas
python3 generate_case_test_splits.py 0
python3 generate_case_test_splits.py 1
python3 generate_case_test_splits.py 2
- Training:
bash train_infer_scripts/train_smbop.sh
- Inference:
bash train_infer_scripts/infer_smbop.sh
- Training:
bash train_infer_scripts/train_structcbr.sh
- Inference:
bash train_infer_scripts/infer_structcbr.sh
smbop/models/tx_cbr_improved_entire_frontier.py
: Implements the StructCBR logic over SmBOPsmbop/models/smbop.py
: Original SmBOP implementation
-
Install pytorch 1.8.1 that fits your CUDA version
-
Install the rest of required packages
pip install -r requirements.txt
-
Run this command to install NLTK punkt.
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"
-
Download the Spider dataset with the following command:
bash scripts/download_spider.sh
Use the following command to train:
python exec.py
First time loading of the dataset might take a while (a few hours) since the model first loads values from tables and calculates similarity features with the relevant question. It will then be cached for subsequent runs. Use the disable_db_content
argument to reduce the pre-processing time in exchange of not performing IR on some (incredibly large) tables.
To create predictions run the following command:
python eval.py --archive_path {model_path} --output preds.sql
To run the evalutation with the official spider script:
python smbop/eval_final/evaluation.py --gold dataset/dev_gold.sql --pred preds.sql --etype all --db dataset/database --table dataset/tables.json
You can download a pretrained model from here. It achieves the following results on the offical script:
easy medium hard extra all
count 248 446 174 166 1034
===================== EXECUTION ACCURACY =====================
execution 0.883 0.791 0.684 0.530 0.753
====================== EXACT MATCHING ACCURACY =====================
exact match 0.883 0.791 0.655 0.512 0.746
You can run SmBoP on a Google Colab notebook here.
You could also use the demo with docker:
docker build -t smbop .
docker run -it --gpus=all smbop:latest
This will create a infrence terminal similar to the Google Colab demo, you could run for example:
>>>inference("Which films cost more than 50 dollars or less than 10?","cinema")
SELECT film.title FROM schedule JOIN film ON schedule.film_id = film.film_id WHERE schedule.price > 50 OR schedule.price<10