StructCBR: Structured Case-based Reasoning for Inference-time Adaptation of Text-to-SQL models

This code is built on top of the Author implementation of SmBOP

NOTE: This is an inital version of the code. We will release a cleaner version soon.

Instructions

Installation

Install requirements.txt: pip install -r requirements.txt
Follow original SmBOP installation instructions given below (also available here)

Data Pre-processing

Download Spider dataset

bash scripts/download_spider.sh

Preprocess spider dataset

python3 python3 process_spider_data.py

Generate 3 random D_new/D_test Splits for the five dev schemas

python3 generate_case_test_splits.py 0
python3 generate_case_test_splits.py 1
python3 generate_case_test_splits.py 2

Running Experiments

SmBOP Model

Training: bash train_infer_scripts/train_smbop.sh
Inference: bash train_infer_scripts/infer_smbop.sh

StructCBR Model:

Training: bash train_infer_scripts/train_structcbr.sh
Inference: bash train_infer_scripts/infer_structcbr.sh

Useful Files

smbop/models/tx_cbr_improved_entire_frontier.py: Implements the StructCBR logic over SmBOP
smbop/models/smbop.py: Original SmBOP implementation

README Instructions from the original SmBoP code:

Install & Configure

Install pytorch 1.8.1 that fits your CUDA version
Install the rest of required packages
```
pip install -r requirements.txt
```

Run this command to install NLTK punkt.

python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"

Download the Spider dataset with the following command:
```
bash scripts/download_spider.sh 
```

Training the parser

Use the following command to train:

python exec.py

First time loading of the dataset might take a while (a few hours) since the model first loads values from tables and calculates similarity features with the relevant question. It will then be cached for subsequent runs. Use the disable_db_content argument to reduce the pre-processing time in exchange of not performing IR on some (incredibly large) tables.

Evaluation

To create predictions run the following command:

python eval.py --archive_path {model_path} --output preds.sql

To run the evalutation with the official spider script:

python smbop/eval_final/evaluation.py --gold dataset/dev_gold.sql --pred preds.sql --etype all --db  dataset/database  --table dataset/tables.json

Pretrained model

You can download a pretrained model from here. It achieves the following results on the offical script:

                     easy                 medium               hard                 extra                all                 
count                248                  446                  174                  166                  1034                
=====================   EXECUTION ACCURACY     =====================
execution            0.883                0.791                0.684                0.530                0.753             

====================== EXACT MATCHING ACCURACY =====================
exact match          0.883                0.791                0.655                0.512                0.746

Demo

You can run SmBoP on a Google Colab notebook here.

Docker

You could also use the demo with docker:

docker build -t smbop .
docker run -it --gpus=all smbop:latest

This will create a infrence terminal similar to the Google Colab demo, you could run for example:

>>>inference("Which films cost more than 50 dollars or less than 10?","cinema")
SELECT film.title FROM schedule JOIN film ON schedule.film_id = film.film_id WHERE schedule.price > 50 OR schedule.price<10

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
configs		configs
demo		demo
scripts		scripts
smbop		smbop
train_infer_scripts		train_infer_scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
beam.py		beam.py
debug.py		debug.py
eval.py		eval.py
eval_beam.py		eval_beam.py
exec.py		exec.py
generate_case_test_splits.py		generate_case_test_splits.py
hold_five_schemas_from_dev.py		hold_five_schemas_from_dev.py
original_requirements.txt		original_requirements.txt
process_spider_data.py		process_spider_data.py
requirements.txt		requirements.txt
resume.py		resume.py
run_on_codalab.sh		run_on_codalab.sh
start_demo.py		start_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StructCBR: Structured Case-based Reasoning for Inference-time Adaptation of Text-to-SQL models

Instructions

Installation

Data Pre-processing

Running Experiments

SmBOP Model

StructCBR Model:

Useful Files

README Instructions from the original SmBoP code:

Install & Configure

Training the parser

Evaluation

Pretrained model

Demo

Docker

About

Releases

Packages

Languages

License

awasthiabhijeet/structcbr

Folders and files

Latest commit

History

Repository files navigation

StructCBR: Structured Case-based Reasoning for Inference-time Adaptation of Text-to-SQL models

Instructions

Installation

Data Pre-processing

Running Experiments

SmBOP Model

StructCBR Model:

Useful Files

README Instructions from the original SmBoP code:

Install & Configure

Training the parser

Evaluation

Pretrained model

Demo

Docker

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages