Skip to content

Sample: COVID19 publication data parsing

Chris Mattmann edited this page Apr 17, 2023 · 7 revisions

Using the COORD-19 data provided by the Office of Science and Technology policy and Kaggle, you can perform a very cool demo of MEMEX GeoParser.

Installation

  1. docker pull nasajplmemex/geo-parser
  2. docker-compose up -d

Pre-requisites

  1. pip install jupyterlab && pip install notebook
  2. pip install pandas pysolr requests tqdm

Get Data

Assuming that you have checked out GeoParser to $GEOPARSER_HOME, then:

  1. cd $GEOPARSER_HOME/examples/covid19

  2. ./download-metadata.sh

  3. ./create-core.sh (make sure that you can see http://localhost:8983/solr/ if not wait a few seconds for Solr to start up.)

  4. ./add-fields.sh

Open up Jupyter

  1. cd $GEOPARSER_HOME/examples/covid19 && jupyter notebook
  2. Run Ingest COVID data.ipynb (will take ~30-40 minutes)

Use GeoParser

  1. Click on Configure Index Tab
    Click on configure index tab
  2. Set Domain Name to covid19_index.
  3. Set Index Path to http://localhost:8983/solr/covid19/
  4. Click on add index
  5. Click add index to store the index of the domain in the database.
  6. Click on Database Icon Tab
    GeoParse and View
  7. Click on GeoParse button, and then wait (takes ~10 minutes)
  8. Click on View button