Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replay arbitrary WARCs through subdomain/subpath CID inclusion #790

Open
ShadowJonathan opened this issue Jan 31, 2023 · 4 comments
Open
Labels
i/o ipwb indexer ipwb replay Landing→Admin Movement Moving the ipwb replay functionality on the main landing page over to host/ipwbadmin Privacy/Security/Encryption

Comments

@ShadowJonathan
Copy link

When looking at this project, I saw that dynamically linking to an "archive" of a website via URLs, if/after I set up IPWB on a subdomain or a website, is not really possible, as IPWB sets itself up to serve only a single archive.

However, I want to use IPWB in combination with some scraping/archiving tools, and simply be able to point IPWB to a CID (IPFS hash) of the WARC file, and have it figure it out.

I'd like to be able to do something like the following;

  • https://ipwb.example.net/QmXXX
  • https://baxxx.ipwb.example.net/

Then, it'd figure out the page its fetching via the CID (timeouts are expected if the file isn't readily accessible), and serve that to the end user.

My primary usecase for this, as stated above, is simply to be able to link from a "hash list" to my ipwb site to show my particular archive of a particular site, or to show any random archive anyone else found on the web, an "IPWB archive reader" mode, where it'd automatically fetch, unpack, and display those files to anyone who'd request it.

@ShadowJonathan ShadowJonathan changed the title Replay arbitrary IPFS hashes through subdomain/subpath inclusion Replay arbitrary WARCs through subdomain/subpath CID inclusion Jan 31, 2023
@machawk1
Copy link
Member

machawk1 commented Mar 6, 2023

@ShadowJonathan Interesting use case. A concern might be allowing just anyone to hit the https://ipwb.example.net/QmXXX endpoint to add data for your system. Some authentication procedure for your own instance would mitigate this.

The functionality is somewhat hidden behind the /ipwbadmin endpoint but we do support adding WARCs at runtime from ipwb's web interface. Being able to specify a CID in lieu of a local WARC file, especially if sent with the auth headers to ensure that the requestor is allowed to add data to the system like this, would satisfy your use case.

Any comments here, @ibnesayeed?

Some tasks:

  • Allow a CID of a WARC file to be specific in the webadmin interface to index WARC files on IPFS at runtime
  • Setup authentication process and an API to allow a WARC payload, WARC path, or CID to be used for adding new data at runtime.
  • Allow an option for the API to be used without authenticating, as described in @ShadowJonathan use case below.

@machawk1 machawk1 added ipwb indexer ipwb replay Landing→Admin Movement Moving the ipwb replay functionality on the main landing page over to host/ipwbadmin Privacy/Security/Encryption i/o labels Mar 6, 2023
@ShadowJonathan
Copy link
Author

For my specific use-case, I'd wanna waive the requirement for authentication, as I'm explicitly also planning for such an instance to also be an "open terminal".

If anything, I'd then only want a whitelist, or a coupling with the option on the IPFS node to not fetch new data (I vaguely remember that being an option somewhere), authentication can then be enforced elsewhere, such as the IPFS node itself, to add data via the private API.

@machawk1
Copy link
Member

machawk1 commented Mar 6, 2023

@ShadowJonathan I understand and the option to use no auth should be an option has this gets implemented. I could foresee the scenario also being useful on an ipwb hosted on a LAN or an otherwise private instance (e.g., your laptop).

Can you clarify in your use case how you envision specifying a CID and not fetching new data? Would the assumption be that the data is already available in your local IPFS node?

@ShadowJonathan
Copy link
Author

Yes, and that other requests would just fail with 404 or a timeout, depending on how the local IPFS node is being hailed by IPWB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i/o ipwb indexer ipwb replay Landing→Admin Movement Moving the ipwb replay functionality on the main landing page over to host/ipwbadmin Privacy/Security/Encryption
Projects
None yet
Development

No branches or pull requests

2 participants