Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results when running echo && cat with bash and fish #38

Open
pbeza opened this issue Apr 22, 2021 · 8 comments
Open

Different results when running echo && cat with bash and fish #38

pbeza opened this issue Apr 22, 2021 · 8 comments

Comments

@pbeza
Copy link

pbeza commented Apr 22, 2021

I've prepared a toy example based on xmp.py which proofs that there is a bug somewhere:

#!/usr/bin/env python

import fuse
import os
import sys
import time
from fuse import Fuse
from threading import Lock

fuse.fuse_python_api = (0, 2)

fuse.feature_assert('stateful_files', 'has_init')


def flag2mode(flags):
    md = {os.O_RDONLY: 'rb', os.O_WRONLY: 'wb', os.O_RDWR: 'wb+'}
    m = md[flags & (os.O_RDONLY | os.O_WRONLY | os.O_RDWR)]

    if flags | os.O_APPEND:
        m = m.replace('w', 'a', 1)

    return m


class Xmp(Fuse):

    def __init__(self, *args, **kw):
        Fuse.__init__(self, *args, **kw)
        self.root = '/'

    def getattr(self, path):
        return os.lstat("." + path)

    def truncate(self, path, len):
        f = open("." + path, "a")
        f.truncate(len)
        f.close()

    def fsinit(self):
        os.chdir(self.root)

    class XmpFile(object):

        def __init__(self, path, flags, *mode):
            self.file = os.fdopen(os.open("." + path, flags, *mode), flag2mode(flags))
            self.fd = self.file.fileno()
            self.iolock = Lock()

        def read(self, length, offset):
            with self.iolock:
                self.file.seek(offset)
                return self.file.read(length)

        def write(self, buf, offset):
            with self.iolock:
                self.file.seek(offset)
                self.file.write(buf)
                return len(buf)

        def release(self, flags):
            self.file.close()

        def _fflush(self):
            if 'w' in self.file.mode or 'a' in self.file.mode:
                self.file.flush()

        def flush(self):
            time.sleep(5)  # simulating network lag
            self._fflush()
            os.close(os.dup(self.fd))

    def main(self, *a, **kw):

        self.file_class = self.XmpFile
        return Fuse.main(self, *a, **kw)


def main():

    usage = Fuse.fusage
    server = Xmp(version="%prog " + fuse.__version__, usage=usage, dash_s_do='setsingle')
    server.parser.add_option(mountopt="root", metavar="PATH", default='/',
                             help="mirror filesystem from under PATH [default: %default]")
    server.parse(values=server, errex=1)

    try:
        server.fuse_args.add('allow_other')
        server.fuse_args.add('attr_timeout=0')   # commenting out this line doesn't help
        server.fuse_args.add('entry_timeout=0')  # commenting out this line doesn't help

        if server.fuse_args.mount_expected():
            os.chdir(server.root)
    except OSError:
        print("can't enter root of underlying filesystem", file=sys.stderr)
        sys.exit(1)

    server.main()


if __name__ == '__main__':
    main()

Notice that there is time.sleep(5) added in flush() method which simulates network lag (if you remove this line, the problem is gone).

Lets mount our toy file system:

python3 ./xmp.py -d -o root=./local/ ./mnt

Test with bash

Now open bash terminal and run:

$ echo 'onetwo' > ./mnt/test-bash.txt && cat ./mnt/test-bash.txt
onetwo
$ cat ./mnt/test-bash.txt
onetwo

Test with fish

Open fish terminal and run:

$ echo 'onetwo' > ./mnt/test-fish.txt && cat ./mnt/test-fish.txt
$ cat ./mnt/test-fish.txt
onetwo

What the heck? Different results for different terminals! I've also checked zsh and csh – they behave same as bash. It seems that only fish is affected.

Log for bash test

unique: 226, opcode: LOOKUP (1), nodeid: 1, insize: 54, pid: 22987
LOOKUP /test-bash.txt
getattr /test-bash.txt
   unique: 226, error: -2 (No such file or directory), outsize: 16
unique: 228, opcode: CREATE (35), nodeid: 1, insize: 70, pid: 22987
create flags: 0x8241 /test-bash.txt 0100644 umask=0022
   create[140680646551824] flags: 0x8241 /test-bash.txt
getattr /test-bash.txt
   NODEID: 2
   unique: 228, success, outsize: 160
unique: 230, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 22987
flush[140680646551824]
unique: 232, opcode: LOOKUP (1), nodeid: 1, insize: 54, pid: 8140
LOOKUP /test-bash.txt
getattr /test-bash.txt
   NODEID: 2
   unique: 232, success, outsize: 144
unique: 234, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 8140
getattr /test-bash.txt
   unique: 234, success, outsize: 120                      # after printing this line it sleeps for 5 seconds
unique: 236, opcode: LOOKUP (1), nodeid: 1, insize: 54, pid: 8140
LOOKUP /test-bash.txt
getattr /test-bash.txt
   NODEID: 2
   unique: 236, success, outsize: 144
unique: 238, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 8140
getattr /test-bash.txt
   unique: 238, success, outsize: 120
   unique: 230, success, outsize: 16
unique: 240, opcode: GETXATTR (22), nodeid: 2, insize: 68, pid: 22987
   unique: 240, error: -38 (Function not implemented), outsize: 16
unique: 242, opcode: WRITE (16), nodeid: 2, insize: 87, pid: 22987
write[140680646551824] 7 bytes to 0 flags: 0x8001
   write[140680646551824] 7 bytes to 0
   unique: 242, success, outsize: 24
unique: 244, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 22987
flush[140680646551824]                                     # after printing this line it sleeps for 5 seconds
   unique: 244, success, outsize: 16
unique: 246, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 0
release[140680646551824] flags: 0x8001
   unique: 246, success, outsize: 16
unique: 248, opcode: LOOKUP (1), nodeid: 1, insize: 54, pid: 28882
LOOKUP /test-bash.txt
getattr /test-bash.txt
   NODEID: 2
   unique: 248, success, outsize: 144
unique: 250, opcode: OPEN (14), nodeid: 2, insize: 48, pid: 28882
open flags: 0x8000 /test-bash.txt
   open[140680646551920] flags: 0x8000 /test-bash.txt
   unique: 250, success, outsize: 32
unique: 252, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 28882
getattr /test-bash.txt
   unique: 252, success, outsize: 120
unique: 254, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 28882
getattr /test-bash.txt
   unique: 254, success, outsize: 120
unique: 256, opcode: READ (15), nodeid: 2, insize: 80, pid: 28882   # reading the file
read[140680646551920] 4096 bytes from 0 flags: 0x8000
   read[140680646551920] 7 bytes from 0
   unique: 256, success, outsize: 23
unique: 258, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 28882
getattr /test-bash.txt
   unique: 258, success, outsize: 120
unique: 260, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 28882
flush[140680646551920]                                     # it prints 'onetwo' to the console and after that, it sleeps for 5 seconds
   unique: 260, success, outsize: 16
unique: 262, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 0
release[140680646551920] flags: 0x8000
   unique: 262, success, outsize: 16

Log for fish test

unique: 336, opcode: LOOKUP (1), nodeid: 1, insize: 54, pid: 5904
LOOKUP /test-fish.txt
getattr /test-fish.txt
   unique: 336, error: -2 (No such file or directory), outsize: 16
unique: 338, opcode: CREATE (35), nodeid: 1, insize: 70, pid: 5904
create flags: 0x8241 /test-fish.txt 0100644 umask=0022
   create[139889311254944] flags: 0x8241 /test-fish.txt
getattr /test-fish.txt
   NODEID: 2
   unique: 338, success, outsize: 160
unique: 340, opcode: GETXATTR (22), nodeid: 2, insize: 68, pid: 22766
   unique: 340, error: -38 (Function not implemented), outsize: 16
unique: 342, opcode: LOOKUP (1), nodeid: 1, insize: 54, pid: 8140
LOOKUP /test-fish.txt
getattr /test-fish.txt
unique: 344, opcode: WRITE (16), nodeid: 2, insize: 87, pid: 22766
write[139889311254944] 7 bytes to 0 flags: 0x8001
   NODEID: 2
   unique: 342, success, outsize: 144
unique: 346, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 8140
getattr /test-fish.txt
   write[139889311254944] 7 bytes to 0
   unique: 344, success, outsize: 24
   unique: 346, success, outsize: 120
unique: 348, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 22766
flush[139889311254944]
unique: 350, opcode: LOOKUP (1), nodeid: 1, insize: 54, pid: 8140
LOOKUP /test-fish.txt
getattr /test-fish.txt
   NODEID: 2
   unique: 350, success, outsize: 144
unique: 352, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 8140
getattr /test-fish.txt
   unique: 352, success, outsize: 120
unique: 354, opcode: LOOKUP (1), nodeid: 1, insize: 54, pid: 22769
LOOKUP /test-fish.txt
getattr /test-fish.txt
   NODEID: 2
   unique: 354, success, outsize: 144
unique: 356, opcode: OPEN (14), nodeid: 2, insize: 48, pid: 22769
open flags: 0x8000 /test-fish.txt
   open[139889311254800] flags: 0x8000 /test-fish.txt
   unique: 356, success, outsize: 32
unique: 358, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 22769
getattr /test-fish.txt
   unique: 358, success, outsize: 120
unique: 360, opcode: GETATTR (3), nodeid: 2, insize: 56, pid: 22769
getattr /test-fish.txt
   unique: 360, success, outsize: 120
unique: 362, opcode: READ (15), nodeid: 2, insize: 80, pid: 22769
read[139889311254800] 4096 bytes from 0 flags: 0x8000
   read[139889311254800] 0 bytes from 0
   unique: 362, success, outsize: 16
unique: 364, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 22769
flush[139889311254800]                                     # after printing this line it sleeps for 5 seconds
   unique: 348, success, outsize: 16
unique: 366, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 0
release[139889311254944] flags: 0x8001
   unique: 366, success, outsize: 16
   unique: 364, success, outsize: 16
unique: 368, opcode: RELEASE (18), nodeid: 2, insize: 64, pid: 0
release[139889311254800] flags: 0x8000
   unique: 368, success, outsize: 16

Notice that FLUSH (triggered by echo):

unique: 348, opcode: FLUSH (25), nodeid: 2, insize: 64, pid: 22766

returns after READ (triggered by cat) returns:

unique: 362, opcode: READ (15), nodeid: 2, insize: 80, pid: 22769

which is too late because test-fish.txt was not yet saved/flushed when we started reading it! Also notice late RELEASE for both echo and cat(in opposite to bash log).

Caveat: if you run add sleep 5 command between echo and cat, it works even with fish terminal:

$ echo 'onetwo' > ./mnt/test-fish.txt && sleep 5 && cat ./mnt/test-fish.txt
onetwo
$ cat ./mnt/test-fish.txt
onetwo

Do you have any idea where may be the bug?

Update

  • If I add -s flag to mounting options (which disables multi-threaded operation), it works just fine for both fish and bash.
  • I tried out to implement an equivalent code in C (based on fusexmp.c), using original libfuse (ver. 2.9.9) and it works just fine as opposed to the above Python version.
  • If I replace:
    def read(self, length, offset):
        with self.iolock:
            self.file.seek(offset)
            return self.file.read(length)
    
    def write(self, buf, offset):
        with self.iolock:
            self.file.seek(offset)
            self.file.write(buf)
            return len(buf)
    with:
    def read(self, length, offset):
        return os.pread(self.fd, length, offset)
    
    def write(self, buf, offset):
        return os.pwrite(self.fd, buf, offset)
    it works for both fish and bash (maybe it has something to do with linux coreutils checksums do not work with the example xmp.py filesystem when multithreaded #18 @drougge?).
  • I'm trying any luck with:
    PYTHONMALLOC=malloc valgrind --leak-check=full --show-leak-kinds=all --log-file="log.txt" python3 ./xmp.py -d -o root=./local/ ./mnt
    it reports one issue which seems to be a not false positive :
    ==51467== 196 bytes in 3 blocks are definitely lost in loss record 3,146 of 3,994
    ==51467==    at 0x483877F: malloc (vg_replace_malloc.c:307)
    ==51467==    by 0x577828: ??? (in /usr/bin/python3.9)
    ==51467==    by 0x62E51D8: MyString_AsEncodedPath (_fusemodule.c:99)
    ==51467==    by 0x62E59D4: Fuse_main (_fusemodule.c:1411)
    ==51467==    by 0x53F34F: ??? (in /usr/bin/python3.9)
    ==51467==    by 0x53C3C8: PyObject_Call (in /usr/bin/python3.9)
    ==51467==    by 0x517B9A: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
    ==51467==    by 0x5106EC: ??? (in /usr/bin/python3.9)
    ==51467==    by 0x528D20: _PyFunction_Vectorcall (in /usr/bin/python3.9)
    ==51467==    by 0x513E8A: _PyEval_EvalFrameDefault (in /usr/bin/python3.9)
    ==51467==    by 0x5106EC: ??? (in /usr/bin/python3.9)
    ==51467==    by 0x528D20: _PyFunction_Vectorcall (in /usr/bin/python3.9)
    
@pbeza
Copy link
Author

pbeza commented May 2, 2021

I've compared:

python3 ./xmp.py -d -o root=./local/ ./mnt
strace -o log-sh-xmp.log -f -t sh -c "echo 'onetwo' > ./mnt/test-sh.txt && cat ./mnt/test-sh.txt"

with:

python3 ./xmp.py -d -o root=./local/ ./mnt
strace -o log-fish-xmp.log -f -t fish -c "echo 'onetwo' > ./mnt/test-fish.txt && cat ./mnt/test-fish.txt"

It seems to me that sh does writing and reading sequentially as opposed to fish which doesn't wait 5 seconds for close() completion before starting read() in parallel (corresponding to cat command). Is it a bug in fish...? Or am I misinterpreting it?

Notice that in:

  • log-sh-xmp.log you can find 2 processes running – the main one, used for echo and the other one created with vfork() + execve("/bin/cat", ...) for cat. We have two processes in here but effectively they run their logic sequentially, one after the other:
    • 435708 does write() corresponding to echo command (it uses dup2() 'trick' as described in here):
      write(1, "onetwo\n", 7) = 7
      
    • 435735 does read() corresponding to cat command:
      read(3, "onetwo\n", 131072) = 7
      
  • log-fish-xmp.log you can find 3 processes running – the main one and two others created with clone() syscall):
    • 434564 is the main process – it communicates with the other two with pipe(); it opens ./mnt/test-fish.txt for writing but it actually doesn't write anything (434565 does):
      openat(AT_FDCWD, "/home/patryk/wildland/xmp/mnt/test-fish.txt", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 6
      
    • 434565 calls write():
      write(6, "onetwo\n", 7) = 7
      
      It also calls close() which finishes just after 434566 completes which is wrong!!!:
      close(6 <unfinished ...>
      
    • 434566 opens ./mnt/test-fish.txt:
      openat(AT_FDCWD, "./mnt/test-fish.txt", O_RDONLY) = 3
      
      and calls read() corresponding to cat command (read() returns nothing probably because 434565 has not completed closing file descriptor no. 6):
      read(3, "", 131072)     = 0
      

Respective FUSE log for sh scenario:

unique: 226, opcode: LOOKUP (1), nodeid: 1, insize: 52
LOOKUP /test-sh.txt
getattr /test-sh.txt
   unique: 226, error: -2 (No such file or directory), outsize: 16
unique: 228, opcode: CREATE (35), nodeid: 1, insize: 68
create flags: 0x8241 /test-sh.txt 0100644 umask=0022
   create[140297570492080] flags: 0x8241 /test-sh.txt
getattr /test-sh.txt
   NODEID: 2
   unique: 228, success, outsize: 160
unique: 230, opcode: LOOKUP (1), nodeid: 1, insize: 52
LOOKUP /test-sh.txt
getattr /test-sh.txt
   NODEID: 2
   unique: 230, success, outsize: 144
unique: 232, opcode: GETATTR (3), nodeid: 2, insize: 56
getattr /test-sh.txt
   unique: 232, success, outsize: 120
unique: 234, opcode: LOOKUP (1), nodeid: 1, insize: 52
LOOKUP /test-sh.txt
getattr /test-sh.txt
   NODEID: 2
   unique: 234, success, outsize: 144
unique: 236, opcode: GETATTR (3), nodeid: 2, insize: 56
getattr /test-sh.txt
   unique: 236, success, outsize: 120
unique: 238, opcode: FLUSH (25), nodeid: 2, insize: 64
flush[140297570492080]
   unique: 238, success, outsize: 16
unique: 240, opcode: GETXATTR (22), nodeid: 2, insize: 68
   unique: 240, error: -38 (Function not implemented), outsize: 16
unique: 242, opcode: WRITE (16), nodeid: 2, insize: 87
write[140297570492080] 7 bytes to 0 flags: 0x8001
   write[140297570492080] 7 bytes to 0
   unique: 242, success, outsize: 24
unique: 244, opcode: FLUSH (25), nodeid: 2, insize: 64
flush[140297570492080]
   unique: 244, success, outsize: 16
unique: 246, opcode: RELEASE (18), nodeid: 2, insize: 64
release[140297570492080] flags: 0x8001
   unique: 246, success, outsize: 16
unique: 248, opcode: LOOKUP (1), nodeid: 1, insize: 52
LOOKUP /test-sh.txt
getattr /test-sh.txt
   NODEID: 2
   unique: 248, success, outsize: 144
unique: 250, opcode: OPEN (14), nodeid: 2, insize: 48
open flags: 0x8000 /test-sh.txt
   open[140297570492176] flags: 0x8000 /test-sh.txt
   unique: 250, success, outsize: 32
unique: 252, opcode: GETATTR (3), nodeid: 2, insize: 56
getattr /test-sh.txt
   unique: 252, success, outsize: 120
unique: 254, opcode: GETATTR (3), nodeid: 2, insize: 56
getattr /test-sh.txt
   unique: 254, success, outsize: 120
unique: 256, opcode: READ (15), nodeid: 2, insize: 80
read[140297570492176] 4096 bytes from 0 flags: 0x8000
   read[140297570492176] 7 bytes from 0
   unique: 256, success, outsize: 23
unique: 258, opcode: GETATTR (3), nodeid: 2, insize: 56
getattr /test-sh.txt
   unique: 258, success, outsize: 120
unique: 260, opcode: FLUSH (25), nodeid: 2, insize: 64
flush[140297570492176]
   unique: 260, success, outsize: 16
unique: 262, opcode: RELEASE (18), nodeid: 2, insize: 64
release[140297570492176] flags: 0x8000
   unique: 262, success, outsize: 16

and for fish:

unique: 226, opcode: LOOKUP (1), nodeid: 1, insize: 54
LOOKUP /test-fish.txt
getattr /test-fish.txt
   unique: 226, error: -2 (No such file or directory), outsize: 16
unique: 228, opcode: CREATE (35), nodeid: 1, insize: 70
create flags: 0x8241 /test-fish.txt 0100644 umask=0022
   create[139940152819328] flags: 0x8241 /test-fish.txt
getattr /test-fish.txt
   NODEID: 2
   unique: 228, success, outsize: 160
unique: 230, opcode: LOOKUP (1), nodeid: 1, insize: 54
LOOKUP /test-fish.txt
getattr /test-fish.txt
   NODEID: 2
   unique: 230, success, outsize: 144
unique: 232, opcode: GETATTR (3), nodeid: 2, insize: 56
getattr /test-fish.txt
   unique: 232, success, outsize: 120
unique: 234, opcode: LOOKUP (1), nodeid: 1, insize: 54
LOOKUP /test-fish.txt
getattr /test-fish.txt
   NODEID: 2
   unique: 234, success, outsize: 144
unique: 236, opcode: GETXATTR (22), nodeid: 2, insize: 68
   unique: 236, error: -38 (Function not implemented), outsize: 16
unique: 238, opcode: GETATTR (3), nodeid: 2, insize: 56
getattr /test-fish.txt
unique: 240, opcode: WRITE (16), nodeid: 2, insize: 87
write[139940152819328] 7 bytes to 0 flags: 0x8001
   unique: 238, success, outsize: 120
   write[139940152819328] 7 bytes to 0
   unique: 240, success, outsize: 24
unique: 242, opcode: FLUSH (25), nodeid: 2, insize: 64
flush[139940152819328]
unique: 244, opcode: LOOKUP (1), nodeid: 1, insize: 54
LOOKUP /test-fish.txt
getattr /test-fish.txt
   NODEID: 2
   unique: 244, success, outsize: 144
unique: 246, opcode: OPEN (14), nodeid: 2, insize: 48
open flags: 0x8000 /test-fish.txt
   open[139940152819472] flags: 0x8000 /test-fish.txt
   unique: 246, success, outsize: 32
unique: 248, opcode: GETATTR (3), nodeid: 2, insize: 56
getattr /test-fish.txt
   unique: 248, success, outsize: 120
unique: 250, opcode: GETATTR (3), nodeid: 2, insize: 56
getattr /test-fish.txt
   unique: 250, success, outsize: 120
unique: 252, opcode: READ (15), nodeid: 2, insize: 80
read[139940152819472] 4096 bytes from 0 flags: 0x8000
   read[139940152819472] 0 bytes from 0
   unique: 252, success, outsize: 16
unique: 254, opcode: FLUSH (25), nodeid: 2, insize: 64
flush[139940152819472]
   unique: 242, success, outsize: 16
unique: 256, opcode: RELEASE (18), nodeid: 2, insize: 64
release[139940152819328] flags: 0x8001
   unique: 256, success, outsize: 16
   unique: 254, success, outsize: 16
unique: 258, opcode: RELEASE (18), nodeid: 2, insize: 64
release[139940152819472] flags: 0x8000
   unique: 258, success, outsize: 16

@omeg
Copy link
Contributor

omeg commented May 5, 2021

Quick update: this seems to work fine with Python 3.7, but not with 3.9... Investigating.
Edit: After more tests it seems that the Python version is irrelevant, but fish version matters: fish 3.1.2 is broken, while 3.2.2 seems to work fine.

@omeg
Copy link
Contributor

omeg commented May 8, 2021

Alright, this behavior is the consequence of multithreaded handling of commands. The PYLOCK() macro is NOT preventing multiple threads to execute fuse commands in parallel. In this case: two flush commands are being executed on the same file due to fish design. So far I've tested that adding for example pthread_mutex_lock() in PYLOCK and corresponding pthread_mutex_unlock() in PYUNLOCK produces correct behavior on the problematic fish 3.1.2. But then, what's the point of multithreading if we basically enforce executing one command at a time? :)

Edit: of course all this is mostly relevant for operations on the same file object. But we can have multiple handles to the same file, or different paths etc, so it's not trivial to only lock in such case. Anyway, this can be solved in the user handler class so IMO it's not a python-fuse issue. The user fuse class needs to ensure that parallel operations on the same file are serialized.

@pbeza
Copy link
Author

pbeza commented May 9, 2021

Anyway, this can be solved in the user handler class so IMO it's not a python-fuse issue. The user fuse class needs to ensure that parallel operations on the same file are serialized.

Is it possible to serialize parallel operations on the same file using FUSE's high level API (which python-fuse is based on)? High level API operates exclusively on paths and hides away inodes from developers.

Any ideas on how to fix a given code sample without disabling multithreading completely?

@marmarek
Copy link

marmarek commented May 9, 2021

Inodes vs paths should not be an issue here. Each open file gets unique file handle (object) in both APIs. IMO the main question is whether it needs to be serialized in context of a single file handle (the same open FD from the user PoV), or across different handles for the same file (path/inode). The former is easier and will have very minimal performance impact, the latter is harder and might have bigger performance impact. Based on the above snipped doesn't have fgetattr implemented at all, I'm afraid it's the latter case.

Here I updated xmp.py with broader locks (the latter case): https://gist.github.com/marmarek/a56ae864b61cba4df32792c99644dda5 - I've added initial version first, so you can easily see the diff. This is obviously PoC only, it leaks locks instances left and right. I have not tested it.

@omeg
Copy link
Contributor

omeg commented May 10, 2021

@marmarek 's approach works (the gist above fails because of a typo in line 71). Fuse also doesn't seem to like the partial constructor (write: Function not implemented). Quick test when I moved iolocks to global scope shows that works.

@pbeza
Copy link
Author

pbeza commented May 10, 2021

Just to make sure you haven't missed it – in this particular example it is sufficient to replace:

def read(self, length, offset):
    with self.iolock:
        self.file.seek(offset)
        return self.file.read(length)

def write(self, buf, offset):
    with self.iolock:
        self.file.seek(offset)
        self.file.write(buf)
        return len(buf)

with:

def read(self, length, offset):
    return os.pread(self.fd, length, offset)

def write(self, buf, offset):
    return os.pwrite(self.fd, buf, offset)

to make it work (as mentioned in my original post). No additional locking was needed besides the above code modification. But still the question persists: what is the minimal locking needed to make it work in every possible parallel/multithreaded test scenario (i.e. not just for fish use case)?

@omeg
Copy link
Contributor

omeg commented May 11, 2021

For the record, I bisected fish and this is the commit that fixes the issue: fish-shell/fish-shell@4b9a096

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants