Replies: 14 comments 32 replies
-
It shouldn't take more than a few minutes on a fairly normal powered machine. Certainly no where near 10 days. What is the last output from the training? What's the machine architecture? |
Beta Was this translation helpful? Give feedback.
-
I've found that if I disable automatic matching for all correspondents, the training task finishes successfully. It seemingly has no issues with training tags or document types. I then re-enabled automatic matching for exactly 1 correspondent, and it still finishes normally. Next I'm going to gradually re-enable more correspondents for automatic matching and see if it's a quantity thing or if I just have a cursed correspondent or what. |
Beta Was this translation helpful? Give feedback.
-
I'm having this same issue on a FreeBSD 13.2 system. I do get two additional lines of output, included below. System has 64GB of ram with an Intel(R) Atom(TM) CPU C3758 @ 2.20GHz, running on NVME drives. I let the manual process run for 6 days before killing it. The entire time, it was using 99% of one core. At the time I only had 5 Correspondents, 6 doc types, 40 documents loaded, and only an inbox tag. I've turned off NLTK (PAPERLESS_ENABLE_NLTK=0), enabled debug, and ran with -v3 for the manual task, all with the same result. Is there any way to get more output to see what this is chewing away at?
Update: Update 2: Update3: |
Beta Was this translation helpful? Give feedback.
-
I have a similar issue, running on a Hetzner ARM VM (left it nearly a week initially), but I've also tried on a Ryzen 5800X (nothing after 4 hours). For me it's certain Document Types. I suspect it might have something to do with the documents themselves, e.g. their length? Contracts is one document type that causes the process to freeze. Adding I don't think it's a memory issue because the Ryzen machine has 32GiB RAM, but it's running NixOS (as is the Hetzner VM), which might have some issues with its package. |
Beta Was this translation helpful? Give feedback.
-
Just jumping in here, with a "me too". I just switched from an older Intel to an AMD 5700G system and noticed that my reboot times went through the roof and apparently that's because I'm also stuck. My numbers and log look like this:
I started the classifier manually and one thing I typically do when something is stuck is to
The high CPU usage seems to be a spin lock in a very tight loop. I'll have a look at what's going on in Python land in the paperless code once I had some coffee. |
Beta Was this translation helpful? Give feedback.
-
Ok, so this is interesting. It looks like a threading issue from a syscall perspective, but I'm now debugging scikit's
Of course this steps down into C-land. I'll try reading up on this and as someone mentioned NixOS might be compiling with sub-optimal options somehow. |
Beta Was this translation helpful? Give feedback.
-
Are there any representative generic scikit-learn benchmarks we could run on our respective systems to see whether this is a general issue or a paperless one? |
Beta Was this translation helpful? Give feedback.
-
Another "me too" here. My system (also nixos) is hanging at the tags classifier, and bizarrely enough, similar to @sn3ak, it only happens when I have more than 13 tags with matching turned on. |
Beta Was this translation helpful? Give feedback.
-
just adding a backlink to the related nixpkgs issue: NixOS/nixpkgs#240591 (I guess github doesn't automatically cross-reference discussions mentioned from issues? huh.) |
Beta Was this translation helpful? Give feedback.
-
I'm having the same issue with less powerfull system (SBC, Rock64) |
Beta Was this translation helpful? Give feedback.
-
Another data point I'd like to investigate is which database we're using. On NixOS, sqlite is the default and likely nobody switches it out because it ought to be enough for paperless purposes. At least that's an easy assumption to make. So, which DB are you using and do you experience this issue? |
Beta Was this translation helpful? Give feedback.
-
We seem to have found an effective workaround on NixOS: replacing OpenBLAS with MKL can be done at runtime by setting |
Beta Was this translation helpful? Give feedback.
-
Just adding a bit context for others, on my FreeBSD 13.2 amd64 system I'm definitely seeing the same behavior getting stuck at the correspondents step and pegging a CPU at 100% for days. log samplesFrom
And my
Looking at the rest of this thread, I see the FreeBSD math/py-numpy uses math/openblas as well. I confirmed my numpy ( The only thing that has addressed this was manually going through and disabling roughly half of the correspondent matching. And just to share, the classification_model with the 17 correspondents is about |
Beta Was this translation helpful? Give feedback.
-
Is there any update on this issue? |
Beta Was this translation helpful? Give feedback.
-
I was wondering which time one should expect when it comes to training the automatic classifier.
Right now I am looking at paperless-ngx 1.10.21 (with 8da3ae2 applied) running in a NixOS VM with 99 correspondents (all with automatic matching), 25 tags (6 with automatic matching), 20 document types (2 with automatic matching) and 1473 Documents.
After having disabled the training queue worker prior to the 1.10 upgrade (since it would take just super long) I manually started
document_create_classifier -v2
in a tmux session on the 29th. However, almost 10 days later the process is still running.During that time, it has not finished. The process is continuously using one CPU Core (4 available) and, according to htop, sits at 5% memory (4 Gig in total, 1.6 reported as free and 3.5 as available right now). The logs report
Document classification model does not exist (yet), not performing automatic matching.
in regular intervals. Judging from the load graph and task results, that stems from the trainer was being started from celery and then time-out after 30 minutes.So I am wondering if this is consistent with your experiences, could there be any other pitfalls I may not have considered, ...?
Footnotes
From flipping through the 1.11.x release notes, I could not find any updates to the classifier itself. Since I am not that interested in discarding ~10 Days of potential progress, I have not yet upgraded and would assume this to still be relevant to current versions. ↩
Beta Was this translation helpful? Give feedback.
All reactions