Polyglotter

Official implementation of "Translating synthetic natural language to database queries with a polyglot deep learning framework", published in Nature Scientific Reports (link)

Description

Polyglotter supports the mapping of natural language searches to database queries. Importantly, it does not require the creation of manually annotated data for training and therefore can be applied easily to multiple domains. The framework is polyglot in the sense that it supports multiple different database engines that are accessed with a variety of query languages, including SQL and Cypher. Furthermore Polyglotter supports multi-class queries. Good performance is achieved on both toy and real databases, as well as a human-annotated WikiSQL query set. Thus Polyglotter may help database maintainers make their resources more accessible.

Graphical summary of the proposed system

Running Polyglotter

In this section, we describe the necessary steps to run Polyglotter on a MySQL database.

Step 1: Generating database schema

In order to generate a dataset to train the translation model, the schema of the target database needs to be generated first. We provide an example script here. The only required change is to specify your database credentials at the end of the script, where the MySQLGetDBSchema object is created, such as:

MySQLSchema = MySQLGetDBSchema('MySQL Example', {"host":"localhost", "user": "root", "passwd": "test", "database": "testdb", "port": 3306})

Then, call the method getDBSchema() method from the object, which will handle all the logic for you in order to generate the DB schema and save it to a pickle file:

MySQLSchema.getDBSchema()

At this point, you should have a MySQLdbSchema.obj generated for you, which should be moved to the Data\Schemas directory.

Step 2: Generating the dataset of question-query pairs

Next, a dataset containing pairs of question-SQL query pairs have to be generated for your target database. Following the use case above, we provide an example script to generate the dataset, preprocess it and compute the FastText embeddings. The script is located here. The output of the script should be moved to the Data\TrainingData\MySQL directory.

Step 3: Training the model using OpenNMT

Finally, you are now able to traing a model for the text-to-SQL translation using OpenNMT. We provide an example script in https://github.com/AdrianBZG/Polyglotter/blob/master/NLP/TrainModels.py, which you can adapt to your needs.

Once the training concludes, you will find the model object in the Models directory. This object can now be used for serving the model.

Trying a trained model

Once you have a trained model, you can use the Jupyter Notebook available in here to quickly run some questions and obtain the corresponding SQL queries over your target database.

Deploying a Web Service to query the model

Also, we are providing a Dockerfile to spin up a web service, so that you can serve your model over an API and obtain its predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Backends		Backends
Data		Data
NLP		NLP
RandomQueryGenerator		RandomQueryGenerator
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
Notebook.ipynb		Notebook.ipynb
PrepareWikiSQLDataset.py		PrepareWikiSQLDataset.py
README.md		README.md
WebService.py		WebService.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Polyglotter

Description

Graphical summary of the proposed system

Running Polyglotter

Step 1: Generating database schema

Step 2: Generating the dataset of question-query pairs

Step 3: Training the model using OpenNMT

Trying a trained model

Deploying a Web Service to query the model

About

Contributors 2

Languages

AdrianBZG/Polyglotter

Folders and files

Latest commit

History

Repository files navigation

Polyglotter

Description

Graphical summary of the proposed system

Running Polyglotter

Step 1: Generating database schema

Step 2: Generating the dataset of question-query pairs

Step 3: Training the model using OpenNMT

Trying a trained model

Deploying a Web Service to query the model

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages