Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Overlapped Support in ReadFile() #141

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

opalmer
Copy link
Owner

@opalmer opalmer commented Nov 23, 2017

Resolves #116.

@opalmer opalmer added this to the 0.6.0 milestone Nov 23, 2017
@opalmer opalmer self-assigned this Nov 23, 2017
@codecov-io
Copy link

codecov-io commented Nov 23, 2017

Codecov Report

Merging #141 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@opalmer
Copy link
Owner Author

opalmer commented Nov 23, 2017

At @exvito, PTAL. I think this will resolve #116 but would like your input before we close this.

@exvito
Copy link
Contributor

exvito commented Nov 24, 2017

@opalmer, thanks.
Hopefully I'll be able to dedicate some time to review and share my input later today.

@exvito
Copy link
Contributor

exvito commented Nov 25, 2017

@opalmer, I didn't manage to get time to review yesterday like i expected; here it is now:

  • Ideally, overlapped support would not change exiting ReadFile usage.
  • Current ReadFile (before this PR) usage returns a byte string.
  • This PR changes ReadFile to return a buffer object that needs to be cffi.unapcked: I feel it is unneeded.
  • The docstring states that a bytearray is returned, though...
  • A simple change in this PR's implementation could improve that:
    • Create the bytearray(nNumberOfBytesToRead) before creating lpBuffer.
    • Then create lpBuffer from that.
    • Return the bytearray object and not the lpBuffer, keeping lpBuffer an internal thing.
    • That would avoid the need for the exposed unpack function and all the little, somehow noisy, explicit unpacks you had to put in place such that tests depending on ReadFile still pass.
  • The current PR seems to be mishandling bytes_read.
    • In the non-overlapped case, that may imply truncating the bytearray to be returned, after the call to the underlying ReadFile (which seems impossible to do, unless the change I suggested I above is in place).
    • In the overlapped case, it should set to NULL before calling the system level ReadFile and discarded after that.

Missing:

  • I did not find tests for ReadFile in overlapped mode.

More radical possibility:

  • An alternative approach, forcefully backwads incompatible, but failing hard and fast when called with existing code (so, safe, from a data perspective), would be to change ReadFile's signature to be more like a 1-to-1 match to the system API:
    • Instead of receiving a handle, nBytesToRead and lpOverlapped, returning a byte string...
    • ...it could receive a handle, a caller created bytesarray (large enough for the operation), nBytesToRead and lpOverlapped, returning the number of bytes read.
  • This would put all the buffer management responsibility on the caller's side.

Let me know of your thoughts on these ideas and please point out anything I may be misunderstanding.

Thanks.

@opalmer
Copy link
Owner Author

opalmer commented Nov 26, 2017

The more I read over your comment and think about the various use cases the more I think it would be appropriate to go with the fully backwards incompatible approach. There's a few reasons I think this route would be better:

  • ReadFile() in its current form is does not mimic the underlying win32 api very well. While this is true for many parts of pywincffi this function, in my opinion, has more than a single well understood use case.
  • Looking at Twisted's use case alone the more control over ReadFile() we can provide the more useful this function will be. It does sacrifice a bit of usability but I'd rather it be correct and flexible enough for a wider variety of use cases. Although the original goal of pywincffi was to make it more usable than pywin32 I don't want to make it so abstracted that it's less useful it could be.
  • The less abstraction that ReadFile() does the easier it should be to support. Overlapped I/O I think has already made it fairly clear that the current implementation while usable in many cases may not be correct to support everything that the underlying windows API could.
  • When handles and non-synchronous calls are involved pywincffi probably shouldn't be making assumptions about how those objects are being used because they are created/managed outside the scope of the function call we're making.

I think making the backwards incompatible change would address nearly add the comments above (except for the missing tests of course....).

@exvito
Copy link
Contributor

exvito commented Nov 30, 2017

Gave it some more thought and believe that moving to an incompatible signature/behavior is the best way forward.

I believed that the current implementation could be enough as long as bytes_read would be handled in some way, but it would require truncating the returned bytearray and, above all, could require further data copying on the caller's side, given that the bytearray is created within ReadFile - not the best design.

The most flexible, kind of Pythoninc, approach will be changing the signature and return value to something closer to the underlying system call:

  • Arguments: hFile: HANDLE, lpBuffer: bytearray [1], nNumberOfBytesToRead: int, lpOverlapped: OVERLAPPED
  • Return: lpNumberOfBytesRead: int

Error conditions would raise exceptions, as currently.
[1] Or some object implementing the buffer protocol, why not?

Returning the number of bytes read is fundamental such that the caller knows how much of the passed in buffer was filled up in the non-overlapped cases; in the overlapping cases I seem to remember that that information is somewhere else (maybe on the OVERLAPPED object?).

In some way, such an API somehow mimics the standard library's io.FileIO.readinto orsocket.recv_into where a writable buffer is passed in and the number of bytes read is returned.

Key benefit: buffer management 100% on the caller's side.
One possible "safety net" would be to add a sanity check verifying that the passed in buffer is at least as big as nNumberOfBytesToRead.

I say, let's break compatibility now (better than later)!

This commit updates the signature of ReadFile() to
better match the underlying Windows API. By doing so
we should be able to better support overlapped I/O and
give the user of this function greater control over the
input and output.

For a complete discussion around why this change was made
see #141.
@opalmer
Copy link
Owner Author

opalmer commented Dec 10, 2017

@exvito please take a look at the latest commit (5434743), I think will address the discussion above.

Updating the code was easy and I especially like that it made the tests more explicit:

    def test_read(self):
        path, written = self._create_file(b"hello world")
        self.assertEqual(written, 11)
        hFile = self._handle_to_read_file(path)
        lpBuffer = bytearray(written)
        read = ReadFile(hFile, lpBuffer, 5)
        self.assertEqual(read, 5)
        self.assertEqual(lpBuffer[:read], bytearray(b"hello"))

        # The rest of the buffer is essentially unmanaged but let's be sure
        # the remainder of the buffer has not been modified by ReadFile().
        self.assertEqual(
            lpBuffer[read:], bytearray(b"\x00\x00\x00\x00\x00\x00"))

I still need to write tests for overlapped I/O but I'd like you to review the latest changes first to be sure I'm not missing anything.

@exvito
Copy link
Contributor

exvito commented Dec 12, 2017

It's looking preety good @opalmer.
I mostly pointed out:

  • The error checking after the ReadFile call, in the wrapper (which, by the way, made me think that maybe the WriteFile wrapper may need improvements here, too).
  • Several test assertions that assume the number of bytes read is exactly the number of bytes requested. Often true, but not necessarily, per the API docs. These may need to spurious test failures.

@exvito
Copy link
Contributor

exvito commented Dec 12, 2017

One last, probably important note, regarding error checking and sync/async operation:

  • It seems than even when overlapped I/O is requested, ReadFile (and WriteFile and others?) may actually complete synchonously.
  • This probably means we need to further refine the error checking / return values in, at least, ReadFile and WriteFile.

See: https://support.microsoft.com/en-us/help/156932/asynchronous-disk-i-o-appears-as-synchronous-on-windows

@opalmer
Copy link
Owner Author

opalmer commented Jan 1, 2018

Fixed a couple of issues while I had some down time. Still need to work through the others (especially the tests). You're probably right that there may be some additional checks we need to implement...I need to take another pass at reading ReadFile again myself before taking another pass at this.

Side note, in the future could you post comments under Files changed? It's a bit easier to follow vs. commenting on individual commits. Regardless, thanks for the review!

@exvito
Copy link
Contributor

exvito commented Jan 1, 2018

This is looking better and better.

Like I commented before, I believe there are two things that need improvement:

  • The tests and their assertions: given that ReadFile() may read less than the requested number of bytes, low-level reading should be loop driven. This will be more complex to implement for the overlapped I/O case given that nNumberOfBytesToRead is not at play and is recommended to be passed in as NULL.
  • The checks after the ReadFile() (and WriteFile(), by the way) need to be more robust. From memory:
    • When doing non-overlapped I/O, the returned value must be non-zero (already checked).
    • When doing overlapped I/O (needs improvement):

PS: Sorry for the comments on the commits, I didn't notice that. I will try to pay more attention. :)
PPS: Happy New Year.

@opalmer
Copy link
Owner Author

opalmer commented Jan 13, 2018

@exvito thanks and happy new year to you too :)

Still not done here, but I added a function in 4f2b0ba which I think will cover basic error checks for both overlapped and non-overlapped I/O. When performing overlapped I/O, additional checks will be needed outside the calls to ReadFile and WriteFile. Definitely something I'm going to dive deeper into when I update the tests.


# Overlapped IO case.
_, library = dist.load()
if GetLastError() == library.ERROR_IO_PENDING and code != 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the other way around. That is, we're facing an error if both:

  • code == 0
  • GetLastError() != ERROR_IO_PENDING

@exvito
Copy link
Contributor

exvito commented Jan 14, 2018

@opalmer,

Looking better and better. I just made a comment on one error checking condition which seems to be the other way around: as far as I can tell, I made it under "Files Changed" and not in the commit. Hopefully I got that right, like you asked the other day. Let me know about that.

Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants