-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Stale] Compare identical environments by using a same_file::Handle
#179
base: main
Are you sure you want to change the base?
Conversation
Thank you @darnuria, It looks like the |
0fba17a
to
993fe55
Compare
same_file::Handle
/// Mind that Handle currently open the file, so avoid writing through the fd held by [same_file::Handle], | ||
/// since the file will also be opened by LMDB. | ||
static OPENED_ENV: Lazy<RwLock<HashMap<Handle, EnvEntry>>> = Lazy::new(RwLock::default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaved a mark since Handle
open the file and hold a file descriptor https://github.com/BurntSushi/same-file/blob/master/src/unix.rs#L10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so can we add a comment about it? I mean a really documentation comment in the lib.rs file. Under an Important/Remark header, please? I don't know how LMDB behaves towards that.
I am not sure that we really care about this sentence as we keep it open the whole env-lifetime right?
Do not have open an LMDB database twice in the same process at the same time. Not even from a plain open() call - close()ing it breaks flock() advisory locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(backported in description in case this PR got stuck/closed to help somebody)
Yeah my worries comes from that line, and my first idea to just check (dev_t, ino_t
)
For sure it may break the flock()
ing (not the best system api..), and the multi-process features of lmdb is really handy.
Checked-out lmdb sources:
- https://github.com/LMDB/lmdb/blob/3947014aed7ffe39a79991fa7fb5b234da47ad1a/libraries/liblmdb/mdb.c#L4786-L4799
- https://github.com/LMDB/lmdb/blob/3947014aed7ffe39a79991fa7fb5b234da47ad1a/libraries/liblmdb/mdb.c#L2998-L3010
- https://github.com/LMDB/lmdb/blob/3947014aed7ffe39a79991fa7fb5b234da47ad1a/libraries/liblmdb/mdb.c#L5146-L5156
- https://github.com/LMDB/lmdb/blob/3947014aed7ffe39a79991fa7fb5b234da47ad1a/libraries/liblmdb/lmdb.h#L102-L105
About flock
ing in lmdb.h the sentense you pointed changed with:
commit 77845345ca9bf3854fd9da60a3e3b0527fa9c76a
Author: Hallvard Furuseth <[email protected]>
Date: Tue Sep 27 07:03:42 2016 +0200
ITS#8505 Clarify fork() caveat, mdb_env_get_fd(), flock->fcntl.
diff --git a/libraries/liblmdb/lmdb.h b/libraries/liblmdb/lmdb.h
index 30d5862..319fcf6 100644
--- a/libraries/liblmdb/lmdb.h
+++ b/libraries/liblmdb/lmdb.h
@@ -97,11 +97,12 @@
* transactions. Each transaction belongs to one thread. See below.
* The #MDB_NOTLS flag changes this for read-only transactions.
*
- * - Use an MDB_env* in the process which opened it, without fork()ing.
+ * - Use an MDB_env* in the process which opened it, not after fork().
*
* - Do not have open an LMDB database twice in the same process at
* the same time. Not even from a plain open() call - close()ing it
- * breaks flock() advisory locking.
+ * breaks fcntl() advisory locking. (It is OK to reopen it after
+ * fork() - exec*(), since the lockfile has FD_CLOEXEC set.)
*
* - Avoid long-lived transactions. Read transactions prevent
* reuse of pages freed by newer write transactions, thus the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what do we conclude? Can we still use the same_file::Handle
struct? If not, what do we plan? Is it even possible to get the dev_t
and ino_t
without opening the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stat
function family (except the fstatat
) check metadata and return and don't maintain a file descriptor.
It's the kind of function used by metadata
in rust std, sadly posix never standardized a "device / FSID" so ino_t/dev_t is left to implementation detail.
For windows it's more https://docs.rs/winapi-util/latest/winapi_util/file/fn.information.html
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfileinformationbyhandle
I read somewhere that hard links on Windows require Administrator privileges... |
(not directly related to hardlink) Yeah windows CI had troble with theses two: https://github.com/meilisearch/heed/actions/runs/5508749200/jobs/10041079210?pr=179#step:4:140 |
Do you know if we can run a certain pipeline in the CI as admin or Windows admin? |
Samefile abstract away differences between Unixes and windows to check if two files lead to the same file on the underlying filesystem. Now it should follow through soft-links, hard-links and file moved.
Precise that Handle open the file.
129ba1d
to
f840921
Compare
f840921
to
641d229
Compare
Don't know, also discovering that windows way of doing is really complicated. 🤡, like moving dir or getting file information. |
assert_eq!(env_name, env.path()); | ||
|
||
let env_renamed = dir.path().join("serafina.mdb"); | ||
std::fs::rename(&env_name, &env_renamed).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the windows part maybe the little UNIX api? https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/rename-wrename?view=msvc-170
I would like to skip the tests that require Admin rights on Windows as I don't have time to make them work in the CI for now 😬 What do you think? |
same_file::Handle
same_file::Handle
This PR implements this suggestion #145 (comment) and try to fix #180.
Stalled because the current implementation with
same_file
open files since windows needs it, but lmdb itself use posix locks withfnctl
syscall on unixes.We use the
same_file::Handle
struct that comparesdev_t
andino_t
instead of raw paths internally to make the comparison more accurate. On Windows, you should read how it works 😅.EDITS: Edited to add the caveat found during implementing the idea.
Caveat of this PR
Tl;DR Keeping FileDescriptor open, may trouble
flock
/fcntl
works inside lmdb and screw multi-process lock with lmdb.From comment: #179 (comment)
Unix ideal: just check
(dev_t, ino_t
)Unix way
stat
kind of function clean nice no File Descriptor keeped.Note: that same file keep open a File Descriptor open!
https://github.com/BurntSushi/same-file/blob/master/src/unix.rs#L10
Windows
An other
same-file
rely on winapi32, involving Filehandles.Lmdb limitation
For sure it may break the
flock()
ing (not the best system api..), and the multi-process features of lmdb is really handy.Checked-out lmdb sources:
About
flock
ing in lmdb.h the sentense you pointed changed with: