Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Add Cache Clearing Functionality #1371

Open
gislawill opened this issue Nov 25, 2024 · 2 comments
Open

[Feature Request]: Add Cache Clearing Functionality #1371

gislawill opened this issue Nov 25, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request triage to be triaged for next action

Comments

@gislawill
Copy link
Collaborator

Provide a clear and concise description of what you want to happen.

Implement a mechanism to clear cached files based on their age. The solution should support both manual and automated triggers to handle scenarios like bug fix rollouts and routine cache maintenance.

Proposed Solutions

  1. Option 1: Script
    • A Python or Bash script loops through the cache and clears files older than a configurable age (default: 30 days).
    • Manual trigger: Run the script manually in the container.
    • Automated trigger: Use an ECS Scheduled Task for periodic execution.
  2. Option 2: Endpoint
    • A FastAPI endpoint clears files older than a configurable age (default: 30 days).
    • Manual trigger: Use an HTTP request (e.g., curl).
    • Automated trigger: Use a GitHub Action to call the endpoint daily.
    • Add basic token-based authentication for security.

Preferred Approach

The endpoint approach is preferred for its flexibility and ease of use. Automated triggers via GitHub Actions make it simple to maintain.

Is there anything else you can add about the proposal? You might want to link to related issues here, if you haven't already.

This was raised during the caching implemented in #1332

@laurentS
Copy link
Collaborator

A couple of thoughts on this point:

  • if you use an endpoint, I'd make it a POST request for 2 reasons:
    • it has a side effect,
    • it would (mostly) prevent crawlers from accidentally clearing the cache when someone shares the url on slack or in an email.
  • there is another option, which is to clear the cache on each request (use the same logic, but run it whenever someone either adds to or retrieves from cache). I'm not sure of the overhead this would add, but it removes the need for an endpoint/script/monitoring...
  • no matter which option you go with, I'd also add a check for disk usage and cull oldest files even if younger than age limit, to prevent filling up the disk in case of high usage.

@gislawill
Copy link
Collaborator Author

Thanks @laurentS for these thoughts. I'll respond to each point below

  • POST request: I agree, that's a good approach
  • clear the cache on each request: I'd like this option if we were to spin up a background process for this cache clearing. I don't think it should happen in the same process as the response handling. We could set up a task queue (like celery) to handle these types of tasks. In my opinion, a task queue seems a little excessive for this specific need but the a great solution when we have a couple more side effects to manage
  • add a check for disk usage: I agree, that's a good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage to be triaged for next action
Projects
None yet
Development

No branches or pull requests

3 participants