HTTP request library, focused on web scraping.
$ npm install @tuplo/fletcher
# or with yarn
$ yarn add @tuplo/fletcher
Fetch a HTML page and parse it using cheerio
.
import fetch from '@tuplo/fletcher';
const $page = await fetch.html('https://foo.com/page.html');
const heading = $page.find('body > h1');
Fetch a JSON file and parse it.
const { foo } = await fetch.json('https://foo.com/page.html');
Find a script on a page and evaluate it.
const { foo } = await fetch.script('https://foo.com/page.html', {
scriptPath: 'script:nth-of-type(3)',
});
Find JSON-LD metadata on a page.
const [jsonld] = await fetch.jsonld('https://foo.com/page.html');
Work with the raw Response.
const res = await fetch.response('https://foo.com');
console.log(res.headers);
console.log(await res.text());
Work with Puppeteer
for headless browser automation.
const client = fetch.create({
browserWSEndpoint: 'ws://localhost:3000',
});
const $page = await client.browser.html('https://foo.com');
const { foo } = await client.browser.json('https://foo.com', /ajax-list/);
Option | Description | Default |
---|---|---|
browserWSEndpoint |
Puppeteer web socket address | |
cache |
Caches requests in memory | false |
delay |
Introduce a delay before the request (ms) | 1_000 |
formData |
Object with key/value pairs to send as form data | |
encoding |
The encoding used by the source page, will be converted to UTF8 | |
headers |
A simple multi-map of names to values | |
jsonData |
Object with key/value pairs to send as json data | |
log |
Should log all request URLS to stderr | false |
onAfterRequest |
Callback to be called right after request is resolved | |
proxy |
Proxy configuration (host , port , username , password ) |
|
retry |
Retries failed responses | async-retry |
scriptFindFn |
A function to find a script element on the page, execute and return it |
|
scriptPath |
A CSS selector to pick a script element on the page, execute and return it |
|
scriptSandbox |
An object to use as base on an execution of a piece of code found on the page | |
urlSearchParams |
A key-value object listing what parameters to add to the query string of url |
|
userAgent |
Set a custom user agent | |
validateStatus |
A function to decide if the response status is an error and should be thrown |
Generic utility to return a HTTP Response
Requests a HTTP resource, parses it using Cheerio and returns its
const $page = await fletcher.html('https://foo.com/page.html');
const heading = $page.find('body > h1');
Requests a HTTP resource, finds a script
on it, executes and returns its global context.
const { foo } = await fletcher.script('https://foo.com/page.html', {
scriptPath: 'script:nth-of-type(3)',
});
Requests a HTTP resource, returning it as a string
Requests a HTTP resources, returning the cookies returned with it.
Requests a HTTP resource, returning it as a JSON object
Requests a HTTP resource, retrieving all the JSON-LD blocks found on the document
Requests a HTTP resource, returning the full HTTP Response object
Requests a HTTP resource using Puppeteer/Chrome, parses it using Cheerio and returns its.
Requests a HTTP resource using Puppeteer/Chrome, intercepts a request made by that page and returns it as a JSON object
Creates a new instance of fletcher with a custom config
const instance = fletcher.create({ headers: { foo: 'bar' } });
await instance.json('http://foo.com');