Support for horizonally scaled installs #1849
Replies: 5 comments 5 replies
-
I think enabling scaling for kubernetes shouldn't be done using the current setup, where one pod is running all the parts of the application. The 'main' django app, mostly serves the UI and doesn't generate much load. Handling the documents is done using tasks that are executed through celery, and we should be able to scale the celery workers to handle more tasks needing to be processed. Ideally celery would run as the only process in it's own pod, and that can then be scaled if needed. Since celery will schedule the queued tasks by itself this should make it easier to scale, as the app consuming documents only runs once |
Beta Was this translation helpful? Give feedback.
-
I had some time to play around with a distributed setup for paperless-ngx and got it working quite well. I've published my kubernetes configuration here: https://github.com/peschmae/paperless-distributed Sometimes if a new document is added through the consumer, I had errors on the celery-workers that they can't find the file, but adding the same file a second time often resolved this issue. I'm not entirely sure why that happened, as I mount a shared PVC into the I'm using the official container images for this setup, since all the processes seem to communicate through redis, this seems to have worked fine as is. |
Beta Was this translation helpful? Give feedback.
-
Except you have, because that config file container the commands run by each pod. For reference, I've done a similar proof of concept without supervisord which seems to work too. |
Beta Was this translation helpful? Give feedback.
-
This discussion has been automatically closed due to lack of community support. Please see our contributing guidelines for more details. |
Beta Was this translation helpful? Give feedback.
-
This discussion has been automatically locked since there has not been any recent activity after it was closed. Please open a new discussion for related concerns. See our contributing guidelines for more details. |
Beta Was this translation helpful? Give feedback.
-
Context
As provided in this bug report there are issues when horizontally scaling paperless-ngx, and the solution very much seems like an ugly fix. The solution for Mail tasks is even more ugly and involves putting fake DNS entries into all but one install.
As this is not really a bug, I put forward the things I feel are required in this feature request.
Changes requested
This should catch parallel consumers from picking up the same files independently.
I full path/ filename check, potentially including size or timestamp, should suffice to catch this.
Again this should ensure that only at most one task is running at the same time, preventing the same Mail document from being picked up multiple times.
I did have instances in which a container was killed due to an OOM situation on the node and then the item was listed as being „in progress“ indefinitely and had to be manually cleaned out.
These three things I became aware of and needed to find workarounds for. There may be more I am so far unaware of.
Beta Was this translation helpful? Give feedback.
All reactions