Skip to content
This repository has been archived by the owner on Apr 29, 2024. It is now read-only.

Proof of concept for a system that counteracts link rot by creating and making permanently citable web resources accessible

License

Notifications You must be signed in to change notification settings

marcelriedel/civers-prototype

 
 

Repository files navigation

CiVers Prototype

A system designed to take snapshots of websites and generate DOIs, aimed at providing permanently citable web resources, which are otherwise prone to link-rot.

See here for a description of the system.

Prerequisites

  • Docker
  • docker-compose

Under Mac and Windows this means just installing Docker Desktop, which includes both.

Setup tested under Ubuntu Linux, Mac (on Intel), Windows. The user interfaces are tested with Chromium, Chrome and Firefox.

Getting started

$ docker-compose up

This starts multiple services, three of which have addresses one can visit in the browser:

Please open each of them in its own tab. To learn about the intended behaviour of the system, the two primary use cases are documented here.

Note that the generated artifacts, screenshots and html files of archived sites, can be found in the archive folder in the root directory of this project.

For an architectural overview consult the technical documentation.

Notes and Troubleshooting

Websockets

Of the two sites at http://localhost:8020/ and http://localhost:8021 make sure to only keep one tab open for each of them. This is because only the last opened tab will keep a websocket connection, which is used for automatic updates when resources change. However, for the 8021 service this does not apply to subsites like http://localhost:8021/<somePath>. Here it does not matter how many tabs one opens.

Red bar on the bottom of the screen

If you encounter in either Citator or DOI Registrar a red message bar at the bottom of the screen which informs about shadow-cljs - Stale Output! or shadow-cljs - Reconnecting ..., wait a few seconds and refrsh the page. Also wait a few seconds and refresh if Widget Host does not show the widget yet. Make sure everything is fine before you proceed.

Clean up

To start the test system from scratch again, one simply removes some files and folders.

Under Linux and Mac use the following script:

$ ./clean.sh

Under Windows, shut down docker-compose (if it runs) and delete all files under archive, except .keep. Then delete the directories citator-data and doi-registrar-data).

Note that you may need special permissions to delete the files created from within the Docker containers.

Development notes

The Citator UI (port 8021) and the DOI Registrar (port 8020) UI provide hot code reload via shadow-cljs.

Also, hot code reload is provided for the backend code. The reload happens on each http request against one of the routes configured in defroutes.

Python development

The webscraping code is written in Python. The code can be developed outside the docker container, in the local environment.

Apart from python3 and pip3, you will need to install some packages:

$ pip3 install selenium==3.8.0
$ pip3 install beautifulsoup4
$ pip3 install requests

as well as a local installation of a Chrome webdriver. On Ubuntu, the following will do:

$ sudo apt-get install chromium-chromedriver

To scrape a website run this script from the root directory of the project:

civers-prototype$ python3 scraper/scrape.py '<some-url>' '<target-name>'

The target name will be used to name the generated artifacts in the archive folder.

There is also a (mini-)test-suite. Run it with

civers-prototype$ python3 scraper/test.py

It should return nothing if everything works fine, otherwise it would show an AssertionError

Working with the REPL

  • Uncomment one and comment the other entrypoint in docker-compose.yml, for a given service
  • docker-compose up
  • Connect to the given REPL port from from you editor

then do

clj:user:> (start)
{:started ["#'resources/resources" "#'server/http-server"]}

About

Proof of concept for a system that counteracts link rot by creating and making permanently citable web resources accessible

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Clojure 68.9%
  • Python 20.2%
  • HTML 5.1%
  • Dockerfile 3.8%
  • CSS 1.7%
  • Shell 0.3%