Skip to content

karlicoss/scrapyroo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cause I got sick of clicking through Deliveroo restaurants. Full text search is a basic human right!

Slides + speaker notes from November Rust London meetup.

Setup and usage

Scraper

Install dependencies: pip3 install --user -r requirements.txt

Add something like:

./scrape --area 'london/canning-town' --postcode 'E164SA' --json /path/to/menus.jsonl

to your crontab to run once a day, e.g. in the morning or something. Presumably, menus don't change often so that's enough.

Indexer

Install tantivy-py, custom branch needed for extra functionality that's not merged yet:

cargo install  \
--git https://github.com/karlicoss/tantivy-py \
--branch delete-all-documents
./index --index /path/to/tantivy-index /path/to/menus.jsonl

Backend

Install tantivy-cli, custom branch needed at the moment to expose highlights via the API:

cargo install  \
--git https://github.com/karlicoss/tantivy-cli \
--branch serve-snippets

Tantivy runs on port 3000 by default

./serve --index /path/to/tantivy-index

Frontend

NOTE: if you're running page locally, you're gonna need to pass by CORS.

You can do it by e.g. using Chromium with no web security:

chromium-browser --disable-web-security --user-data-dir=/tmp/whatever frontend/index.html