Skip to content

ravern/gollum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

Gollum

Robots.txt parser with caching. Modelled after Kryten. Docs can be found here.

Usage

Call Gollum.crawlable?/3 to obtain whether a certain URL is permitted for the specified user agent.

iex> Gollum.crawlable?("hello", "https://google.com/")
:crawlable
iex> Gollum.crawlable?("hello", "https://google.com/m/")
:uncrawlable

Gollum is an OTP app (For the cache) so just remember to specify it in the extra_applications key in your mix.exs to ensure it is started.

Gollum allows for some configuration in your config.exs file. The following shows their default values. They are all optional.

config :gollum,
  name: Gollum.Cache, # Name of the Cache GenServer
  refresh_secs: 86_400, # Amount of time before the robots.txt will be refetched
  lazy_refresh: false, # Whether to setup a timer that auto-refetches, or to only refetch when requested
  user_agent: "Gollum" # User agent to use when sending the GET request for the robots.txt

Author

Ravern Koh - <[email protected]>