-
Notifications
You must be signed in to change notification settings - Fork 35.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: switch from curl to urllib for HTTP requests #29970
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. |
This is a script used for the tests, so adding |
The title's tag also changed from |
38176a7
to
ddd27cb
Compare
@laanwj I investigated and found that you modified edit: It seems bitcoin/ci/test/00_setup_env_native_previous_releases.sh is responsible to provides everything that the environment's needed. Shall I add |
Yes, that's usually how it'd be done. But gah, |
ddd27cb
to
29e9377
Compare
29e9377
to
7fe94f7
Compare
I replaced the |
['curl', '--remote-name', tarballUrl] | ||
] | ||
try: | ||
response = urlopen(Request(tarballUrl, method='HEAD')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it required to check for 404 in a separate, additional request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how the current code works too - it first does a HEAD request to check that the file is there, then goes to downloading. i don't know the original reasoning behind this, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I could write more pythonically and with one request, but the reason I wrote it this way was to commit to the previous code. If you agree with one request, I can change it.
Should this change be in a new commit or should I force it on the previous commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this change be in a new commit or should I force it on the previous commit?
Squashing seems fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To rewrite the git history for having the last change on the last commit and ignore old ones, I forced it. I used squashing only for merging before that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this myself, and the reason seems to be that absent a HEAD
request, a 404.html would be downloaded as the targz, which then fails the shasum check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With curl that makes sense, though here the urlopen
will raise a HTTPError
on a 404 for both a GET
and HEAD
request, so the earlier HEAD does seem superfluous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crACK 7fe94f7
7fe94f7
to
588cad3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 588cad3
One drawback of this change is that a user doesn't see download progress per file any more (relevant if e.g. servers are slow or the connection is bad for any other reason; for me this currently takes several minutes per file :/) and hence is left in the dark whether there is still activity, so Concept ~0. |
I appreciate your feedback. While it's true that switching from Moreover, the optimization to reduce two requests into one significantly lowers the traffic load on the host, enhancing overall performance. Although users might not see detailed progress, the efficiency gains and cleaner codebase justify this trade-off. For those needing more detailed progress feedback, we can consider integrating optional logging mechanisms directly into the code, ensuring we don't rely on third-party libraries like |
This isn't true? There are still two requests before and after this change, so this isn't changed. Also a |
try: | ||
response = urlopen(tarballUrl) | ||
with open(tarball, 'wb') as file: | ||
file.write(response.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this load the full response into memory, which curl does not do? (Probably fine, but if the approach is extended to debug symbols, which are 0.5GB, this may become an issue)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can response.read(amt=<number of bytes>)
so it could be chunked to avoid loading the whole thing in to memory. That could also be used to add back some coarse progress indicator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but at some point I wonder if it is worth it to try to re-implement curl
in Python. I am not sure, but is curl not bundled in git
, which is already a dependency of this project? Of all open source software projects curl
is currently unlikely to go unmaintained, so the motivation still seems weak.
Also tend toward NACK here, as there are several issues here from a quick glance:
|
Twice I asked if you think it's fine, we can use one request to check the status code (if it's 404) or download (if status code isn't 404) and I got no answer. Right now the code send two requests and I did not change that to one request because we're not aware of what's the reason, but it's clear that one request is enough. It seems I can not change the logic easily and I must continue on the old same logic, otherwise I'd like to write it in that way which is more reliable and general for handling all kind of errors, not only 404.
|
As the author of a pull request, there is an expectation that you spend time to review and to test the code you have written yourself. I try to ask questions during review to motivate pull request authors to look into interesting areas and edge-cases to explore in the change. There are currently more than 300 open pull request and I currently don't have the time to test everything I can imagine in all of them. In this case, it is not really that hard, so I did the test myself, see #29970 (comment). So with the current code in master, the two requests are needed. I haven't tested this with your code, so I can't provide you an answer to that. |
I think this can be closed for now. It seems too controversial for a simple test change? |
I think so. I'll try to take a look at other issues, I'd be happy if I can help. |
Ok, closing for now. You can also help with review of other open pull requests. This will teach you about the project and naturally you will find leftovers, follow-ups, or (un)related changes that you can tackle yourself. |
Switched from using subprocess to make HTTP requests (curl) to using the Python urllib library, improving cleanliness and maintainability.