BWBImportBot Pipeline Epic #97

mekarpeles · 2021-05-04T19:06:58Z

Some ideas we discussed:

Using sqlite instead of books.jsonl to append every month's results in batches
Replace import.log also w/ sqlite
Possibly explore gnu parallel

BharatKalluri · 2021-05-05T22:04:24Z

Bot @ https://gist.github.com/BharatKalluri/1b3c7fd88a780a9cdd99063715a5baa1

The script has two modes.

./bwb-import-bot.py setup_db ./bwb.csv : Parses and cleans all the data, and then inserts the data into a database called bwb-import-state.db. All these entries will have a status of TO_BE_IMPORTED and null in a column called comment in the DB. (This should therotically not take more than a few minutes even for files > 1GB, but need to test). This is something which will be run every time OL recieves data (which is most probably once a month).
./bwb-import-bot.py process : Reads a batch (currently 10000) of records whose status is TO_BE_IMPORTED from the DB and tries to import them into OL. If the request succeeds, then the status will change to SUCCESS for that row in the DB else it will change to ERROR with the error in the comment. This is a process which will keep running in the background. It stops running when there are no rows in the DB whose status is TO_BE_IMPORTED.

Some more thoughts here, we can technically make use of all the cores on the system (using pandarell) and parallelize the process step. And thereby start making a lot of import calls in parallel. @mekarpeles had two very good points to not do this for now

Import process is a very intensive process and queuing many imports may impact the server's stability
OL ratelimits to one request per second, hence we cannot make multiple calls in parallel

This optimization is something we might explore later on, but right now everything will be sync.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BWBImportBot Pipeline Epic #97

BWBImportBot Pipeline Epic #97

mekarpeles commented May 4, 2021

BharatKalluri commented May 5, 2021 •

edited

Loading

BWBImportBot Pipeline Epic #97

BWBImportBot Pipeline Epic #97

Comments

mekarpeles commented May 4, 2021

BharatKalluri commented May 5, 2021 • edited Loading

BharatKalluri commented May 5, 2021 •

edited

Loading