We use the pipenv dependency/virtualenv framework:
$ pipenv install
$ pipenv shell
(mac-graph-sjOzWQ6Y) $
You can watch the model predict values from the hold-back data:
$ python -m macgraph.predict --name my_dataset --model-version 0ds9f0s
predicted_label: shabby
actual_label: derilict
src: How <space> clean <space> is <space> 3 ? <unk> <eos> <eos>
-------
predicted_label: small
actual_label: medium-sized
src: How <space> big <space> is <space> 4 ? <unk> <eos> <eos>
-------
predicted_label: medium-sized
actual_label: tiny
src: How <space> big <space> is <space> 7 ? <unk> <eos> <eos>
-------
predicted_label: True
actual_label: True
src: Does <space> 1 <space> have <space> rail <space> connections ? <unk>
-------
predicted_label: True
actual_label: False
src: Does <space> 0 <space> have <space> rail <space> connections ? <unk>
-------
predicted_label: victorian
actual_label: victorian
src: What <space> architectural <space> style <space> is <space> 1 ? <unk>
TODO: Get it predicting from your typed input
To train the model, you need training data.
If you want to skip this step, you can download the pre-built data from our public dataset. This repo is a work in progress so the format is still in flux.
The underlying data (a Graph-Question-Answer YAML from CLEVR-graph) must be pre-processed for training and evaluation. The YAML is transformed into TensorFlow records, and split into train-evaluate-predict tranches.
First generate a gqa.yaml
with the command:
clevr-graph$ python -m gqa.generate --count 50000 --int-names
cp data/gqa-some-id.yaml ../mac-graph/input_data/raw/my_dataset.yaml
Then build (that is, pre-process into a vocab table and tfrecords) the data:
mac-graph$ python -m macgraph.input.build --name my_dataset
--limit N
will only read N records from the YAML and only output a total of N tf-records (split across three tranches)--type-string-prefix StationProperty
will filter just questions with type string prefix "StationProperty"
Let's build a model. (Note, this requires training data from the previous section).
General advice is to have at least 40,000 training records (e.g. build from 50,000 GQA triples)
python -m macgraph.train --name my_dataset
Running too slow? Fan spinning too much? Use FloydHub with this magic button:
Or manually run on FloydHub with floyd run
You can easily run the unit tests for the model:
python -m macgraph.test
Also the model construction functions contain many assertions to help validate correctness.
To run the system locally as single process:
pipenv install
pipenv shell
python -m experiment.k8 --master-works
To test:
pipenv shell
./script/test.sh
- Install Helm
- Initialise Helm on your cluster as per their docs
Give helm permissions on GKE:
kubectl create clusterrolebinding --user system:serviceaccount:kube-system:default kube-system-cluster-admin --clusterrole cluster-admin
Install a queue:
helm install --name one --set rabbitmq.username=admin,rabbitmq.password=secretpassword,rabbitmq.erlangCookie=secretcookie stable/rabbitmq
Check the console output to get the AMPQ url and password for your new queue
Update secret.yaml to have the url to your AMQP queue.
Deploy the config:
kubectl create -f kubernetes/mac-graph.yaml
Set up permissions:
kubectl create serviceaccount default --namespace default
kubectl create clusterrolebinding default-cluster-rule --clusterrole=cluster-admin --serviceaccount=default:default
To see dashboard:
gcloud config config-helper --format=json | jq --raw-output '.credential.access_token'
kubectl proxy