web-archiving

Here are 108 public repositories matching this topic...

programminghistorian / ph-submissions

The repository and website hosting the peer review process for new Programming Historian lessons

python api open-source mapping multi-lingual web-scraping digital-humanities data-management pedagogy web-archiving network-analysis linked-open-data programming-historian dh open-educational-resources r-studio digital-history distant-reading

Updated Jun 4, 2024
Jupyter Notebook

webrecorder / browsertrix

Sponsor

Star

Browsertrix is the hosted, high-fidelity, browser-based crawling service from Webrecorder designed to make web archiving easier and more accessible for all!

kubernetes cloud archiving warc web-archiving webrecorder web-archive wacz

Updated Jun 4, 2024
TypeScript

webrecorder / browsertrix-crawler

Sponsor

Star

Run a high-fidelity browser-based crawler in a single Docker container

crawler web-crawler crawling warc web-archiving webrecorder wacz

Updated Jun 4, 2024
TypeScript

ArchiveBox / ArchiveBox

Sponsor

Star

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Updated Jun 4, 2024
Python

webrecorder / archiveweb.page

Sponsor

Star

A High-Fidelity Web Archiving Extension for Chrome and Chromium based browsers!

extension archiving chromium web-archiving webrecorder wacz

Updated Jun 4, 2024
JavaScript

ArchivingToolsForWBM / AdvancedInternetArchiving

Star

Makes saving pages in bulk to the wayback machine much easier

web-archiving webarchiving

Updated Jun 3, 2024
HTML

gildas-lormeau / single-file-cli

Star

CLI tool for saving a faithful copy of a complete web page in a single HTML file (based on SingleFile)

nodejs cli web-scraper web-scraping web-archiving single-file deno

Updated Jun 3, 2024
JavaScript

yuzhoumo / piazzabox

Star

Piazza course archiver and viewer

python piazza web-archiving alpinejs

Updated Jun 2, 2024
Python

yuzhoumo / edbox

Star

Ed course archiver and viewer

python jinja2 web-archiving edstem alpinejs

Updated Jun 2, 2024
Python

harvard-lil / perma

Star

Indelible links

libraries web-archiving

Updated Jun 4, 2024
JavaScript

webrecorder / replayweb.page

Sponsor

Star

Serverless replay of web archives directly in the browser

service-worker warc web-archiving wayback-machine web-archive replay-web-page web-replay wacz

Updated May 31, 2024
TypeScript

nla / pandas4

Star

Web archive workflow system

web-archiving

Updated May 28, 2024
Java

nla / heritrix3

Star

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

web-archiving

Updated May 29, 2024
Java

webrecorder / warcio

Sponsor

Star

Streaming WARC/ARC library for fast web archive IO

python warc web-archiving web-archives pywb

Updated May 27, 2024
Python

oduwsdl / ipwb

Star

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS

python docker service-worker ipfs memento warc web-archiving wayback memento-rfc

Updated May 24, 2024
Python

A suite of tools for mirroring and hoarding web pages you visit for later offline viewing. I.e. your own personal Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data, which also follows "archive everything now, figure out what to do with it later" philosophy.

backups internet self-hosted archive web-archiving wayback-machine internet-archiving