Improve Overlapped Support in ReadFile() #141

opalmer · 2017-11-23T23:32:30Z

Resolves #116.

… overlapped io

codecov-io · 2017-11-23T23:32:39Z

Codecov Report

Merging #141 into master will increase coverage by 0.02%.
The diff coverage is 100%.

opalmer · 2017-11-23T23:38:13Z

At @exvito, PTAL. I think this will resolve #116 but would like your input before we close this.

exvito · 2017-11-24T08:22:14Z

@opalmer, thanks.
Hopefully I'll be able to dedicate some time to review and share my input later today.

exvito · 2017-11-25T12:19:33Z

@opalmer, I didn't manage to get time to review yesterday like i expected; here it is now:

Ideally, overlapped support would not change exiting ReadFile usage.
Current ReadFile (before this PR) usage returns a byte string.
This PR changes ReadFile to return a buffer object that needs to be cffi.unapcked: I feel it is unneeded.
The docstring states that a bytearray is returned, though...
A simple change in this PR's implementation could improve that:
- Create the bytearray(nNumberOfBytesToRead) before creating lpBuffer.
- Then create lpBuffer from that.
- Return the bytearray object and not the lpBuffer, keeping lpBuffer an internal thing.
- That would avoid the need for the exposed unpack function and all the little, somehow noisy, explicit unpacks you had to put in place such that tests depending on ReadFile still pass.
The current PR seems to be mishandling bytes_read.
- In the non-overlapped case, that may imply truncating the bytearray to be returned, after the call to the underlying ReadFile (which seems impossible to do, unless the change I suggested I above is in place).
- In the overlapped case, it should set to NULL before calling the system level ReadFile and discarded after that.

Missing:

I did not find tests for ReadFile in overlapped mode.

More radical possibility:

An alternative approach, forcefully backwads incompatible, but failing hard and fast when called with existing code (so, safe, from a data perspective), would be to change ReadFile's signature to be more like a 1-to-1 match to the system API:
- Instead of receiving a handle, nBytesToRead and lpOverlapped, returning a byte string...
- ...it could receive a handle, a caller created bytesarray (large enough for the operation), nBytesToRead and lpOverlapped, returning the number of bytes read.
This would put all the buffer management responsibility on the caller's side.

Let me know of your thoughts on these ideas and please point out anything I may be misunderstanding.

Thanks.

opalmer · 2017-11-26T16:09:26Z

The more I read over your comment and think about the various use cases the more I think it would be appropriate to go with the fully backwards incompatible approach. There's a few reasons I think this route would be better:

ReadFile() in its current form is does not mimic the underlying win32 api very well. While this is true for many parts of pywincffi this function, in my opinion, has more than a single well understood use case.
Looking at Twisted's use case alone the more control over ReadFile() we can provide the more useful this function will be. It does sacrifice a bit of usability but I'd rather it be correct and flexible enough for a wider variety of use cases. Although the original goal of pywincffi was to make it more usable than pywin32 I don't want to make it so abstracted that it's less useful it could be.
The less abstraction that ReadFile() does the easier it should be to support. Overlapped I/O I think has already made it fairly clear that the current implementation while usable in many cases may not be correct to support everything that the underlying windows API could.
When handles and non-synchronous calls are involved pywincffi probably shouldn't be making assumptions about how those objects are being used because they are created/managed outside the scope of the function call we're making.

I think making the backwards incompatible change would address nearly add the comments above (except for the missing tests of course....).

exvito · 2017-11-30T13:23:37Z

Gave it some more thought and believe that moving to an incompatible signature/behavior is the best way forward.

I believed that the current implementation could be enough as long as bytes_read would be handled in some way, but it would require truncating the returned bytearray and, above all, could require further data copying on the caller's side, given that the bytearray is created within ReadFile - not the best design.

The most flexible, kind of Pythoninc, approach will be changing the signature and return value to something closer to the underlying system call:

Arguments: hFile: HANDLE, lpBuffer: bytearray [1], nNumberOfBytesToRead: int, lpOverlapped: OVERLAPPED
Return: lpNumberOfBytesRead: int

Error conditions would raise exceptions, as currently.
[1] Or some object implementing the buffer protocol, why not?

Returning the number of bytes read is fundamental such that the caller knows how much of the passed in buffer was filled up in the non-overlapped cases; in the overlapping cases I seem to remember that that information is somewhere else (maybe on the OVERLAPPED object?).

In some way, such an API somehow mimics the standard library's io.FileIO.readinto orsocket.recv_into where a writable buffer is passed in and the number of bytes read is returned.

Key benefit: buffer management 100% on the caller's side.
One possible "safety net" would be to add a sanity check verifying that the passed in buffer is at least as big as nNumberOfBytesToRead.

I say, let's break compatibility now (better than later)!

This commit updates the signature of ReadFile() to better match the underlying Windows API. By doing so we should be able to better support overlapped I/O and give the user of this function greater control over the input and output. For a complete discussion around why this change was made see #141.

opalmer · 2017-12-10T19:08:40Z

@exvito please take a look at the latest commit (5434743), I think will address the discussion above.

Updating the code was easy and I especially like that it made the tests more explicit:

    def test_read(self):
        path, written = self._create_file(b"hello world")
        self.assertEqual(written, 11)
        hFile = self._handle_to_read_file(path)
        lpBuffer = bytearray(written)
        read = ReadFile(hFile, lpBuffer, 5)
        self.assertEqual(read, 5)
        self.assertEqual(lpBuffer[:read], bytearray(b"hello"))

        # The rest of the buffer is essentially unmanaged but let's be sure
        # the remainder of the buffer has not been modified by ReadFile().
        self.assertEqual(
            lpBuffer[read:], bytearray(b"\x00\x00\x00\x00\x00\x00"))

I still need to write tests for overlapped I/O but I'd like you to review the latest changes first to be sure I'm not missing anything.

…-readfile

exvito · 2017-12-12T09:44:57Z

It's looking preety good @opalmer.
I mostly pointed out:

The error checking after the ReadFile call, in the wrapper (which, by the way, made me think that maybe the WriteFile wrapper may need improvements here, too).
Several test assertions that assume the number of bytes read is exactly the number of bytes requested. Often true, but not necessarily, per the API docs. These may need to spurious test failures.

exvito · 2017-12-12T09:56:21Z

One last, probably important note, regarding error checking and sync/async operation:

It seems than even when overlapped I/O is requested, ReadFile (and WriteFile and others?) may actually complete synchonously.
This probably means we need to further refine the error checking / return values in, at least, ReadFile and WriteFile.

See: https://support.microsoft.com/en-us/help/156932/asynchronous-disk-i-o-appears-as-synchronous-on-windows

…to overlapped-readfile

opalmer · 2018-01-01T16:48:44Z

Fixed a couple of issues while I had some down time. Still need to work through the others (especially the tests). You're probably right that there may be some additional checks we need to implement...I need to take another pass at reading ReadFile again myself before taking another pass at this.

Side note, in the future could you post comments under Files changed? It's a bit easier to follow vs. commenting on individual commits. Regardless, thanks for the review!

exvito · 2018-01-01T22:50:34Z

This is looking better and better.

Like I commented before, I believe there are two things that need improvement:

The tests and their assertions: given that ReadFile() may read less than the requested number of bytes, low-level reading should be loop driven. This will be more complex to implement for the overlapped I/O case given that nNumberOfBytesToRead is not at play and is recommended to be passed in as NULL.
The checks after the ReadFile() (and WriteFile(), by the way) need to be more robust. From memory:
- When doing non-overlapped I/O, the returned value must be non-zero (already checked).
- When doing overlapped I/O (needs improvement):
  - The returned value may be zero as long as GetLastError() returns ERROR_IO_PENDING.
  - I'm not sure what the returned value should be when asking for overlapped I/O but the kernel completes the I/O operation synchronously: I believe that in that case the returned value is non-zero just like in the non-overlapped I/O case (full details here: https://support.microsoft.com/en-us/help/156932/asynchronous-disk-i-o-appears-as-synchronous-on-windows)

PS: Sorry for the comments on the commits, I didn't notice that. I will try to pay more attention. :)
PPS: Happy New Year.

opalmer · 2018-01-13T17:41:12Z

@exvito thanks and happy new year to you too :)

Still not done here, but I added a function in 4f2b0ba which I think will cover basic error checks for both overlapped and non-overlapped I/O. When performing overlapped I/O, additional checks will be needed outside the calls to ReadFile and WriteFile. Definitely something I'm going to dive deeper into when I update the tests.

exvito · 2018-01-14T10:15:47Z

pywincffi/kernel32/error.py

+
+ # Overlapped IO case.
+ _, library = dist.load()
+ if GetLastError() == library.ERROR_IO_PENDING and code != 0:


I think this is the other way around. That is, we're facing an error if both:

code == 0

GetLastError() != ERROR_IO_PENDING

exvito · 2018-01-14T10:18:48Z

@opalmer,

Looking better and better. I just made a comment on one error checking condition which seems to be the other way around: as far as I can tell, I made it under "Files Changed" and not in the commit. Hopefully I got that right, like you asked the other day. Let me know about that.

Cheers.

opalmer added 7 commits November 23, 2017 13:16

initial commit of unpack()

3b641a8

fix test for Python 2.7

a6081a1

#116 - initial conversion of ReadFile() to provide better support for…

a043f44

… overlapped io

fixed tests in TestReadFile

78e808a

fix pipe tests using ReadFile

f263ad6

fixing test for python 2

4f9b1c2

remove unused import

85fc807

opalmer added lib: kernel32 refactor labels Nov 23, 2017

opalmer added this to the 0.6.0 milestone Nov 23, 2017

opalmer self-assigned this Nov 23, 2017

updating changelog

d02f023

return bytearray directly, don't require unpacking

86ed6d8

opalmer added 2 commits December 10, 2017 13:58

fix documentation reference for ReadFile()

5e607a2

opalmer added 2 commits December 10, 2017 14:15

buffer is a built-in in Python 2.7

b210399

Merge branch 'master' of github.com:opalmer/pywincffi into overlapped…

1af3f47

…-readfile

opalmer added 4 commits January 1, 2018 11:33

add ticks for lpBuffer to match other variable in error

985bcda

update doc string example to reflect new API

f90b4ed

fix expected value for error_check() in ReadFile()

1116004

Merge branch 'overlapped-readfile' of github.com:opalmer/pywincffi in…

45b6c3a

…to overlapped-readfile

fix test assertion

a97e8eb

opalmer added 3 commits January 6, 2018 12:19

added GetLastError, SetLastError and get_error_message()

f2fc542

implemented overlapped_error_check()

4f2b0ba

improving overlapped error checking in ReadFile() and WriteFile()

309cf81

exvito reviewed Jan 14, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Overlapped Support in ReadFile() #141

Improve Overlapped Support in ReadFile() #141

opalmer commented Nov 23, 2017

codecov-io commented Nov 23, 2017 •

edited

Loading

opalmer commented Nov 23, 2017

exvito commented Nov 24, 2017

exvito commented Nov 25, 2017 •

edited

Loading

opalmer commented Nov 26, 2017 •

edited

Loading

exvito commented Nov 30, 2017 •

edited

Loading

opalmer commented Dec 10, 2017

exvito commented Dec 12, 2017 •

edited

Loading

exvito commented Dec 12, 2017

opalmer commented Jan 1, 2018 •

edited

Loading

exvito commented Jan 1, 2018

opalmer commented Jan 13, 2018

exvito Jan 14, 2018

exvito commented Jan 14, 2018

Improve Overlapped Support in ReadFile() #141

Are you sure you want to change the base?

Improve Overlapped Support in ReadFile() #141

Conversation

opalmer commented Nov 23, 2017

codecov-io commented Nov 23, 2017 • edited Loading

Codecov Report

opalmer commented Nov 23, 2017

exvito commented Nov 24, 2017

exvito commented Nov 25, 2017 • edited Loading

opalmer commented Nov 26, 2017 • edited Loading

exvito commented Nov 30, 2017 • edited Loading

opalmer commented Dec 10, 2017

exvito commented Dec 12, 2017 • edited Loading

exvito commented Dec 12, 2017

opalmer commented Jan 1, 2018 • edited Loading

exvito commented Jan 1, 2018

opalmer commented Jan 13, 2018

exvito Jan 14, 2018

Choose a reason for hiding this comment

exvito commented Jan 14, 2018

codecov-io commented Nov 23, 2017 •

edited

Loading

exvito commented Nov 25, 2017 •

edited

Loading

opalmer commented Nov 26, 2017 •

edited

Loading

exvito commented Nov 30, 2017 •

edited

Loading

exvito commented Dec 12, 2017 •

edited

Loading

opalmer commented Jan 1, 2018 •

edited

Loading