Skip to content

Snapshots a web page to get it as a static, self-contained HTML document.

License

Notifications You must be signed in to change notification settings

WebMemex/freeze-dry

Repository files navigation

Freeze-dry: web page conservation

Freeze-dry captures a web page as it is currently shown in the browser. It takes the DOM, grabs its subresources such as images and stylesheets, and compiles them all into a single string of HTML.

The resulting HTML document is a static, self-contained snapshot of the page, that could for example be used for archival, offline viewing, or static republishing; it could be saved on a usb stick or attached to an email, and be opened on any device.

Technically, freeze-dry is a JavaScript function that is run on a web page. It is mainly intended for use by browser extensions and headless browsers. Much of its behaviour can be customised if desired.

How does it compare to…

Freeze-drying a web page is comparable to making a screenshot, or ‘printing’ to a PDF file. But the snapshot adapts to the viewer’s screen size, allows text to be selected, can be read by a screen reader, and so on; just as it would on the original web page.

It is thus more comparable to web browsers’ “Save As…” feature, except that it puts page resources inside the file (not in a folder next to it), and it captures the current view, after scripts executed (and it removes the scripts).

Freeze-dry is most similar to what browser extensions like SingleFile or WebScrapbook do. It is used in (and spun off from) the WebMemex browser extension.

But the main difference from all the above: freeze-dry is a JavaScript/TypeScript module, and highly customisable, so it can be used in other software for various snapshotting (or other) purposes.

For example, the researchers at Ink & Switch found freeze-dry their favorite solution to make web page clippings for their Capstone creativity tool:

“The solution we settled on for Capstone is freeze-dry. Its use was just a few lines of code.

Freeze Dry takes the page’s DOM as it looks in the moment, with all the context of the user’s browser including authentication cookies and modifications made to the page dynamically via Javascript. It disables anything that will make the page change (scripts, network access). It captures every external asset required to faithfully render that and inlines it into the HTML.

We felt that this is a philosophically-strong approach to the problem. Freeze-dry can save to a serialized .HTML file for viewing in any browser; for Capstone, we stored the clipped page as one giant string in the app’s datastore.”

How does it work?

As a first approximation, freezeDry can be thought of as a simple function that captures the DOM and returns it as a string, like this:

async function simpleFreezeDry() { return document.documentElement.outerHTML; }

However, freezeDry does a lot more: inline frame contents and subresources (as data: URLs), remove scripts and interactivity, expand relative links, timestamp the snapshot, etc.

For a detailed explanation, see How freeze-dry works.

Install

Old-fashioned JS

For a good old Javascript global variable, download the latest .umd.js script and include it among your scripts, e.g.:

<script src="./freeze-dry.umd.js"></script>

The freeze-dry function is then freezeDry.freezeDry() (adjust example code accordingly).

ES module

For using it as a module in the browser, download the latest .es.js module and import it in your code, e.g.:

import freezeDry from './freeze-dry.es.js'

NPM package

For use via npm/yarn/… (to bundle it with webpack/rollup/vite/…), download the package, e.g.:

npm install freeze-dry

Then, in your code, either import or require it:

import freezeDry from 'freeze-dry'

const { freezeDry } = require('freeze-dry')

Usage

const html = await freezeDry(document, options)

In a few seconds, freezeDry should return your snapshot as a string (potentially a very long one).

The options parameter is optional. In fact, document is optional too (it defaults to window.document). For usage details, see its documentation.

Customising freeze-dry’s behaviour

The options argument to the freezeDry() function lets you tweak its behaviour. For example, instead of inlining subresources as data: URLs, you could store the subresources separately; perhaps to create an MHTML file, or to store each resource on IPFS. See the FreezeDryConfig documentation for all options.

If freezeDry’s options don’t suffice for your needs, you can even build your own custom freezeDry-ish function by directly using freeze-dry’s internals. To get started, have a look at the API documentation, especially the Resource class, and peek at the implementation of FreezeDryer.