-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQLite database is created in memory #85
Comments
Yeah, this is normal, at least for now. The problem is under which name to store the data for recursive archives. I think this issue might be a duplicate of #79. |
What about storing all .tar indices in single DB?
…On Wed, May 25, 2022 at 2:41 PM Maximilian Knespel ***@***.***> wrote:
Yeah, this is normal, at least for now. The problem is under which name to
store the data for recursive archives. I think this issue might be a
duplicate of #79 <#79>.
—
Reply to this email directly, view it on GitHub
<#85 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG76GIFPQVXHKHQZ5TDSILVLYNZBANCNFSM5W43SFAQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
For further databases caused by recursive archives, I think I answered your question. Do you mean when using the union mounting feature like so: What is your use case? |
Maybe I expressed myself incorrectly.... |
Ah, I see. But, maybe your problem also will disappear when using the new
Could you paste one of those warnings? I'm beginning to doubt that it is normal. Also, what is the compression chain? It should only try to use an in-memory database in circumstances like mounting a compressed tar that is inside another archive. |
That is precisely my case... |
Unfortunately, yes. But, I'll try to fix it, but it might take a while :/. PRs are welcome ... My basic idea to fix this is outlined in #79. The downwards-compatible version would simply add tables for each recursively contained TAR. Simply adding it inside the existing table won't work because there would not be enough information. The table basically just stores names and offsets. It would additionally also need to safe something like the path to the recursive archive but that seems like a waste of space and might also be harder to implement because as it is implemented now, multiple SQLiteIndexedTar instances will be created, one for each recursive TAR, and they basically don't know of each other. Well, instead of this large amount of work, it might be simpler, to support writing indexes of in-memory file objects out to the index-folder. The only problem is to somehow generate a stable name. E.g. using the hash of the whole archive would be a good name but it would be too cost-prohibitive to calculate. A hash over the metadata might work though as those data has to be calculated anyway and should be magnitudes smaller than the file contents. And the file contents don't matter anyway. But, in order to speed up loading with the indexes, I wouldn't be able to check all metadata, only like the first 1000. I'm already doing something similar to detect TARs which have been appended to. It still would only be a heuristic nothing 100% stable :/. Currently, if the TAR is not placed directly besides the archive, it will be placed in the home folder with a kind of cleaned up path to the archive as name. I might simply store those recursive tars with their inner path appended to the path of the outside tar. That should be unique enough. If the path becomes too long to use as a file name, I could simply hash it. Hmm, thinking about it, I might be able to implement the second idea soonish. |
I'm not really DB expert, but I suspect that table creation is an expensive operation. |
How many archives inside the outer archive are we talking about? |
The biggest one i've met is more than 300K |
That is quite a lot and indeed might need more brainstorming and benchmarking :/. This might also trigger performance problems at other locations in the code, for example inside the AutoMountLayer class, which mounts those archives recursively and has to keep a map of all mounted recursive locations and has to look them up each time FUSE requests a file. |
Given that the self.mounted is a dict in AutoMountLayer |
Maybe I expressed myself incorrectly....
Actually for each .tar file ratarmount creates a sqlite database specific
to this .tar.
I thought maybe it'll be more efficient to have ONE database which
contains data for ALL archives simultaneuously?
Of course this will require significant modifications of existing codebase
but not too complicated i think.
The idea would be to assign virtual_inode_number to each archive and
include it as a key field in all tables of this unique db...
The advantage of this approach is that it could be easily adapted to other
SQL based database which is useful where mounting
directories with a LOT of really BIG archives. I'm talking about disks
with of several TB of data and archive of hundreds GBs and
more than 100K files inside.
This is actually my use case.
Thanks to your advice i've implement a kind of hybrid between guestmount
and ratarmount.
I use libguestfs to mount .is, .img, *.ova, *.vmdk files
then i create a temp dir containing mount points (with help of mount
--bind) for the above files
and then in launch ratarmount -r -l to mount this temp dir.
Given the fact the the disks images contains big archives with archives
inside and that ratarmount uses :memory: ro index archives inside archives
the
memory consumption is pretty impressive, hence my ideas on reorganizing the
DB.
…On Wed, May 25, 2022 at 3:51 PM Maximilian Knespel ***@***.***> wrote:
What about storing all .tar indices in single DB?
For further databases caused by recursive archives, I think I answered
your question.
Do you mean when using the union mounting feature like so: ratarmount
file1.tar file2.tar mountpoint? In this case, I think it is better to
have one DB per archive in order to increase reusability when, e.g., trying
to mount only file1.tar or when trying to add another archive to the
union mount: ratarmount file1.tar file2.tar file3.tar mountpoint.
What is your use case?
—
Reply to this email directly, view it on GitHub
<#85 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG76GLMLBQ5D43YSJDDUTDVLYV7DANCNFSM5W43SFAQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
When using -r option I'm getting log messages about SQLite databases created in :memory: even when --index-folder is specified.
Is it normal?
Btw there is separate db for each .tar file - wouldn't be more efficient to have only one db?
The text was updated successfully, but these errors were encountered: