Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: use cache control headers to determine if content has changed since last snapshot #1337

Open
3 of 9 tasks
Juliaria08 opened this issue Jan 28, 2024 · 0 comments

Comments

@Juliaria08
Copy link

Type

  • General question or discussion
  • Propose a brand new feature
  • Request modification of existing behavior or design

What is the problem that your feature request solves

I'd like to request ArchiveBox send the If-Modified-Since if it has already fetched the website previously and the website sent the Last-Modified header. Or send If-None-Match from the stored value of the ETag response, such that feeds like Rachel Kroll's feed can easily be fetched without having to wait a full day.

This would also make long fetching of sites easier on both our host and the remote's host.

Describe the ideal specific solution you'd want, and whether it fits into any broader scope of changes

Sorry, I think I should've read the entire thing before. It should send If-Modified-Since if it has already fetched the website previously, using the value the server sent on the Last-Modified. Or it should send a If-None-Match from the value of ETag if found.

I don't know if sending both is allowed, but I guess it'd be acceptable to prefer If-Modified-Since if both are.

What hacks or alternative solutions have you tried to solve the problem?

I've considered putting a HTTP proxy that would store those tags, and have archivebox be in the middle, but that doesn't look pretty.

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to have eventually

I don't really mind too much, but I'd appreciate it being there, as archivebox could cause strain on servers, and thus we might get blocked from being able to archive things if we archive too deep.


  • I'm willing to contribute dev time / money to fix this issue
  • I like ArchiveBox so far / would recommend it to a friend
  • I've had a lot of difficulty getting ArchiveBox set up

I'm a fairly "new" systems admin, and I haven't set ArchiveBox up in a public enviroment, it is only running on my laptop, but I could easily set it up as I have already set up some other Django based apps to a system. But I don't have time to do things.

@pirate pirate changed the title Feature Request: Conditional requests Feature Request: use cavhing headers to determine if content has changed since last snapshot Jan 29, 2024
@pirate pirate changed the title Feature Request: use cavhing headers to determine if content has changed since last snapshot Feature Request: use cache control headers to determine if content has changed since last snapshot Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant