A tool to search the web for small business contact information. Contains 2 servers: The crawler server and the front-end. There is currently no indexer and the crawler sends the results as they are found.
There are 2 servers on this repository. One is the actual web crawling service, the other is the server for the web interface. Both the servers run on coffeescript/node.js.
# First install all the necessary packages, along with coffeescript.
$ cd web-crawlers/
$ sudo npm i -g iced-coffee-script
$ npm i
# Run in foreground
$ iced server.iced
# Run as a background daemon attached to the terminal
$ iced server.iced &
# Run as an unattended daemon
$ nohup iced server.iced &
# First install all the necessary packages, along with coffeescript.
$ cd web-app/
# If coffee-script isn't already installed...
$ npm i -g iced-coffee-script
$ npm i
# Run in foreground
$ iced srv/server.main.iced
# Run as a background daemon attached to the terminal
$ iced srv/server.main.iced &
# Run as an unattended daemon
$ nohup srv/iced server.main.iced &
# Open browser to localhost:5000
The web app allows the user to run customized searches by the bots, and provides an output console to view the status of the bot.
You can also use a CLI to get the contact information on the CLI or printed to a .csv file.
# Here is an example with the http://allbiz.com crawler.
$ cd web-crawlers/
$ iced all-biz/allbiz-crawler.iced
It is possible to add a new website to the crawler with minimal code. Create a new file in the web-crawlers/ directory. Relative the web crawling libray.
crawler = require 'iced-crawler'
runner = # Controls the flow of the crawler.
run: false
iced-crawler.bfs
root: 'http://twitter.com'
path: '/'
visit: ({ root, path })-> console.log 'visiting', path
running: runner.run # Switch off to stop crawling.
linkP: (link)-> link? and link.includes 'tweet'
done: ->
crawler.bfs({ root, path, visit, running, done, linkP })
# root - the domain to crawl.
# path - the starting point in the domain to crawl.
# visit - a function to run on each visited link.
# running - a structure to control the flow of the function.
# done - a callback to run after search is exhausted.
# linkP - link predicate. Run on each link and return true to follow.
Edit the following files to have the crawler show up as an option in the GUI.
-
route.lead-search.iced
in class RouteLeadSearch, add name to state.sources Name must match that in ebizfi-crawlerserver -
route.lead-search.iced
Add an entry to structure in functionsetCrawlers
Add a validation function if necessary. match packet to the server programs input -
view.lead-search-controls.iced
Add the name under the approriate controls, or create a new statement which includes this crawler.