You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Provide a clear and concise description of what you want to happen.
Implement a mechanism to clear cached files based on their age. The solution should support both manual and automated triggers to handle scenarios like bug fix rollouts and routine cache maintenance.
Proposed Solutions
Option 1: Script
• A Python or Bash script loops through the cache and clears files older than a configurable age (default: 30 days).
• Manual trigger: Run the script manually in the container.
• Automated trigger: Use an ECS Scheduled Task for periodic execution.
Option 2: Endpoint
• A FastAPI endpoint clears files older than a configurable age (default: 30 days).
• Manual trigger: Use an HTTP request (e.g., curl).
• Automated trigger: Use a GitHub Action to call the endpoint daily.
• Add basic token-based authentication for security.
Preferred Approach
The endpoint approach is preferred for its flexibility and ease of use. Automated triggers via GitHub Actions make it simple to maintain.
Is there anything else you can add about the proposal? You might want to link to related issues here, if you haven't already.
This was raised during the caching implemented in #1332
The text was updated successfully, but these errors were encountered:
if you use an endpoint, I'd make it a POST request for 2 reasons:
it has a side effect,
it would (mostly) prevent crawlers from accidentally clearing the cache when someone shares the url on slack or in an email.
there is another option, which is to clear the cache on each request (use the same logic, but run it whenever someone either adds to or retrieves from cache). I'm not sure of the overhead this would add, but it removes the need for an endpoint/script/monitoring...
no matter which option you go with, I'd also add a check for disk usage and cull oldest files even if younger than age limit, to prevent filling up the disk in case of high usage.
Thanks @laurentS for these thoughts. I'll respond to each point below
POST request: I agree, that's a good approach
clear the cache on each request: I'd like this option if we were to spin up a background process for this cache clearing. I don't think it should happen in the same process as the response handling. We could set up a task queue (like celery) to handle these types of tasks. In my opinion, a task queue seems a little excessive for this specific need but the a great solution when we have a couple more side effects to manage
add a check for disk usage: I agree, that's a good idea
Provide a clear and concise description of what you want to happen.
Implement a mechanism to clear cached files based on their age. The solution should support both manual and automated triggers to handle scenarios like bug fix rollouts and routine cache maintenance.
Proposed Solutions
• A Python or Bash script loops through the cache and clears files older than a configurable age (default: 30 days).
• Manual trigger: Run the script manually in the container.
• Automated trigger: Use an ECS Scheduled Task for periodic execution.
• A FastAPI endpoint clears files older than a configurable age (default: 30 days).
• Manual trigger: Use an HTTP request (e.g., curl).
• Automated trigger: Use a GitHub Action to call the endpoint daily.
• Add basic token-based authentication for security.
Preferred Approach
The endpoint approach is preferred for its flexibility and ease of use. Automated triggers via GitHub Actions make it simple to maintain.
Is there anything else you can add about the proposal? You might want to link to related issues here, if you haven't already.
This was raised during the caching implemented in #1332
The text was updated successfully, but these errors were encountered: