A GitHub Action workflow for automating the collection of crime and fire logs posted by the University of Southern California's Department of Public Safety.
- Runner: github.com/biglocalnews/usc-crime-reports-scraper/actions/workflows/etl.yaml
- PDF collection: documentcloud.org/app?q=%2Bproject%3Ausc-department-of-public--210827%20
- Source: dps.usc.edu/alerts/log/
Fork the repository and clone the repository. Move into the code directory. Install the dependencies.
pipenv install --dev
Install the pre-commit hooks.
pipenv run pre-commit install
Create a .env
file and set the required DocumentCloud variables. You'll need to provide credentials with access to our project. Contact Ben Welsh if you need to be given access to the project.
DOCUMENTCLOUD_USER=
DOCUMENTCLOUD_PASSWORD=
DOCUMENTCLOUD_PROJECT_ID=usc-department-of-public--210827
Run the scrape command.
pipenv run python -m src.scrape