-
-
Notifications
You must be signed in to change notification settings - Fork 849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backed killed [FIX] #634
Comments
Yeah, most likely this is happening when Khoj is trying to index the image pdf's in your knowledge base and running out of memory/cpu. What's the specifications (i.e RAM, CPU, VRAM on GPU) on the machine you're running Khoj on? Can you gradually give it more of your content to sync? E.g Add one directory at a time and restart Khoj to sync that new data. This way once it's indexed all your data without being killed, it should be easier to sync any updates to you add to your knowledge base |
Intel(R) Core(TM) i5-3230M CPU @ 2.60GHz Thank you for the suggestion of trying one directory at a time. I'll give that a go. I wish there was a way to know which directory/file was being processed though, it would make it much easier. BTW when this happens I usually see some temporary PDFs in my home directory. I'm assuming these would have been cleaned up by the process if it had completed successfully. Maybe this might give me a clue as to where the trouble lies. |
Hey @edbock, were you able to get your PDF's indexed? Fair point on visibility into which file was being last synced to better understand how to split the indexing of data, in such scenarios. Let me see how we can show that. And not very sure but it does sound like the temp PDFs maybe from the process being killed in the middle of indexing. If so, then you're correct that that should provide at least some clue into where indexing stopped until a cleaner way to show what is being currently indexed is found |
Thank you very much for following up. Unfortunately I haven't had any time to spend on this lately. I'll report back when I get a chance. |
Hi @edbock ! Just looking for some clarification here.
|
@sabaimran, thank you for your questions. AFAIK so far Khoj has never managed to index all the PDFs. It often leaves 1-5 pdf files with "temp" or something like that as part of the file name. It is entirely possible that it is a memory usage issue. I am using the command-line client. Although I am using the Obsidian interface to communicate with the client, I'm pretty sure it's the client that is causing the issues. Everything works fine until the client starts indexing files, and after a period of time (a few minutes or more), the computer locks up and then Khoj crashes. I don't have a swap file enabled so my suspicion is that Ubuntu kills the process to restore order to the system. These are mostly text PDFs. They do contain images, but none of them are predominantly image-based AFAIK. I have gone another route for a solution to this issue for myself. However, I would be glad to help with testing this issue if you want. As long as you can give me some specific things to watch for, report on, etc. |
Hey @Openegg15, can you clarify what the link you've shared references? And provide more context to your statement? |
Describe the bug
A clear and concise description of what the bug is. Please include what you were expecting to happen vs. what actually happened.
Khoj is a great app, and works quite well, but I have issues during indexing. CPU is in constant use which doesn't surprise me, but sometimes it hogs all available CPU and my machine becomes unresponsive for a minute or two. I assume there's some kind of fail-safe that kicks in because the process ends with the message "Killed".
I have not been able to completely index all the files afaik. I assume this may have something to do with the pdf/image indexing functions. Here are the last four lines of terminal output:
[08:03:19 PM] WARNING Because the aspect ratio of the current image exceeds the limit (min_height or width_height_ratio), the program will skip the detection step. main.py:158
[08:06:48 PM] INFO 🔥 Deleted (0, {}) day-old user requests configure.py:346
[08:12:13 PM] WARNING Because the aspect ratio of the current image exceeds the limit (min_height or width_height_ratio), the program will skip the detection step. main.py:158
Killed
To Reproduce
Steps to reproduce the behavior:
khoj --anonymous-mode --disable-chat-on-gpu --verbose
Requires nothing on my part. This happens every time the backend has been running for more than an hour or two.
Platform
If self-hosted
1.50
Additional context
Add any other context about the problem here.
This has happened every single time I run the backend.
The text was updated successfully, but these errors were encountered: