-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force FSYNC #115
Comments
https://moosefs.com/Content/Downloads/moosefs-2-0-users-manual.pdf I think this is what you want :) |
So, setting that to 1, means to call FSYNC on every fclose ? What if client is asking for fsync by it's own and HDD FSYNC BEFORE CLOSE is set as 0? fsync is honored or ignored due to the config parameter ? |
@OXide94 can help here... |
Hi @guestisp and @oszafraniec, The parameter @oszafraniec mentioned is about In MooseFS write is a transaction, let's assume somebody is writing 2 MiB. These 2 mebibytes are anyway divided into 64 kiB blocks. In current implementation Client connects to the CS, sends 64 kiB, does not wait for ACK and sends next 64 kiB. We can consider adding such parameter. The questions is: @guestisp, would you like to have This is theory, we would probably need to make some comparison tests (just add this Thanks, |
@OXide94 Preface: I'm not an expert. There are AFAIK, two ways to be sure that a write operation has really reached disk and not any cache buffer: by opening a file with Now, what this means in MooseFS, I don't know. My questions are (obviously, i'm talking about files stored on MooseFS)
From my point of view, anything aimed to improve data consistency should be added so that any sysop is able to choose based on their requirements. If adding a flag is not an issue, yes, add it. If adding 2 flags (one for fsync after every 64kb, one for fsync after the whole group of block) is not an issue, than please add both. Anyway, I don't think adding fsync after every 64kb is useful as long you force clients to wait for the final fsync. If the whole file can't be properly flushed on disk, you should block the write operation and notify the client (that is still here waiting for a write ACK). Why you should send fsync after each 64kb? In example, if you run If you run Will MooseFS honor these two cases ? Can we force one of these cases (or both) by setting a configuration parameter even if client is not asking for |
Trying to figure out how this works. even writing with
did i miss something? |
Small correction: by setting Or we can set all writes to be sync with |
ok. This is my fault. I didn't know that FUSE passes such flags as O_SYNC to userspace, but I've just checked and it does. It passes O_SYNC,O_ASYNC,O_NONBLOCK and O_NOATIME. Now I need to think how to take them into account. The most important is probably O_SYNC. We have many options here:
In 2 and 3 successful fsync/close (but not write) done by client on descriptor opened with O_SYNC will mean that your data are synced to disks on CS. What do you think? In my opinion option 3 is the best. Is it safe enough? |
O_SYNC means "Write operations on the file will complete according to the requirements of synchronized I/O file integrity completion", so if you do the first part of 2. "Pass this O_SYNC flag to CS and open chunks with such flag" it should be enough to cover the semantics, and no additional fsync() should be needed. |
While at it, see if O_DSYNC can also be implemented, it's similar:
|
@zcalusic the problem is that MooseFS is totally ignoring this flag and this flag is not being passed when opening chunk file for writing... |
@guestisp , please read the comments before replying, you have missed at least one from @acid-maker |
Sorry my fault |
Anyway, O_SYNC should return to the client only when write is properly stored on disk and not immediatly as @acid-maker said or the client will be unaware of any failures |
Yes, of course. But, as the write() call is synchronous, it's just a matter of passing its return status to the caller, hopefully that can be easily integrated with the current workflow, don't know MooseFS internals well, but @acid-maker will. 😄 |
@acid-maker wrote differently: "In such case client's write will return immediatelly without sync'ing data" So, writes won't be synchronous but still in writeback. If, as client, i'm asking for o_sync is because i want to be 100% sure that data is really flushed to disk so write must return after the real flush, even if much slower |
I see. I mostly ignored that part thinking that opening chunk with O_SYNC should be enough, and that write() propagates its return code back upstream, already. I base my understanding on the following figure: So, O_SYNC and similar would be passed via pt. 4/5/6, and write() would return status code via pt. 7. Of course, that figure is much simplified, and the real world is certainly more complicated. :) In any case, supporting these flags would bring MooseFS closer to POSIX compatibility, so it would be great if they could be added and properly supported. |
@acid-maker: to follow up on our conversation - one idea was to keep current fsync behavior as-is, but use an extended attribute to mark files or directory trees where FSYNC or DIRECT compliance is required, and honoring that flag all the way to the chunk writes. |
Any news on this rather fundamental behavior? |
This is still on our roadmap. |
Hi, please add core filesystem functionality for basic operation, thank you. |
they will never add something useful, the promised, years ago, a v4 free for everyone, i also had binaries to test and use with an awesome HA mode but the v4 is still closed source and unavailable. they promise a lot of things .... |
it's a shame because mfs is by far the best distributed storage available |
Hi, how is roadmap doing these days? |
they does nothing except bug fix. they talk, talk, talk, talk, ..... |
Is possible to force MooseFS to issue fsync on every operation before returning an ACK to the client?
What if i want to be 100% sure that data is properly written to disk even if the client is not asking for fsync?
The text was updated successfully, but these errors were encountered: