test: switch from curl to urllib for HTTP requests #29970

iw4p · 2024-04-26T12:24:44Z

Switched from using subprocess to make HTTP requests (curl) to using the Python urllib library, improving cleanliness and maintainability.

DrahtBot · 2024-04-26T12:24:47Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type	Reviewers
ACK	AngusP
Concept NACK	maflcko

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

test/get_previous_releases.py

laanwj · 2024-04-26T13:43:44Z

This is a script used for the tests, so adding test tag.

iw4p · 2024-04-26T14:00:56Z

The title's tag also changed from Refactor to Test

iw4p · 2024-04-26T17:46:22Z

@laanwj I investigated and found that you modified import requests here here. Is there any shell script that I can provide pip install requests to pass the previous release, depends DEBUG?

edit: It seems bitcoin/ci/test/00_setup_env_native_previous_releases.sh is responsible to provides everything that the environment's needed. Shall I add pip install requests here?

laanwj · 2024-04-26T22:25:44Z

Yes, that's usually how it'd be done.

But gah, requests is an external dependency? That's unfortunate, i don't think we should be adding dependencies unless absolutely necessary. Probably better to do this with python's built in urllib or http client functionality, or keep it like this.

iw4p · 2024-04-27T05:35:10Z

I replaced the requests with urllib, tested the script for downloading .tar.gz file and works fine for me.

maflcko · 2024-04-27T07:20:17Z

test/get_previous_releases.py

- ['curl', '--remote-name', tarballUrl]
- ]
+ try:
+ response = urlopen(Request(tarballUrl, method='HEAD'))


Is it required to check for 404 in a separate, additional request?

That's how the current code works too - it first does a HEAD request to check that the file is there, then goes to downloading. i don't know the original reasoning behind this, though.

No, I could write more pythonically and with one request, but the reason I wrote it this way was to commit to the previous code. If you agree with one request, I can change it.
Should this change be in a new commit or should I force it on the previous commit?

Should this change be in a new commit or should I force it on the previous commit?

Squashing seems fine?

To rewrite the git history for having the last change on the last commit and ignore old ones, I forced it. I used squashing only for merging before that.

Was this addressed?

I tested this myself, and the reason seems to be that absent a HEAD request, a 404.html would be downloaded as the targz, which then fails the shasum check.

With curl that makes sense, though here the urlopen will raise a HTTPError on a 404 for both a GET and HEAD request, so the earlier HEAD does seem superfluous?

AngusP

crACK 7fe94f7

test/get_previous_releases.py

AngusP

ACK 588cad3

theStack · 2024-06-07T14:47:13Z

One drawback of this change is that a user doesn't see download progress per file any more (relevant if e.g. servers are slow or the connection is bad for any other reason; for me this currently takes several minutes per file :/) and hence is left in the dark whether there is still activity, so Concept ~0.

iw4p · 2024-06-07T15:53:45Z

One drawback of this change is that a user doesn't see download progress per file any more (relevant if e.g. servers are slow or the connection is bad for any other reason; for me this currently takes several minutes per file :/) and hence is left in the dark whether there is still activity, so Concept ~0.

I appreciate your feedback. While it's true that switching from curl to urllib removes the per-file download progress visibility, this change offers significant improvements in code cleanliness and maintainability, which are crucial for long-term project sustainability.

Moreover, the optimization to reduce two requests into one significantly lowers the traffic load on the host, enhancing overall performance. Although users might not see detailed progress, the efficiency gains and cleaner codebase justify this trade-off. For those needing more detailed progress feedback, we can consider integrating optional logging mechanisms directly into the code, ensuring we don't rely on third-party libraries like tqdm.

maflcko · 2024-06-09T09:20:18Z

Moreover, the optimization to reduce two requests into one significantly lowers the traffic load on the host,

This isn't true? There are still two requests before and after this change, so this isn't changed. Also a HEAD request shouldn't cause any meaningful traffic, compared to a full download.

maflcko · 2024-06-09T09:23:43Z

test/get_previous_releases.py

+ try:
+ response = urlopen(tarballUrl)
+ with open(tarball, 'wb') as file:
+ file.write(response.read())


Does this load the full response into memory, which curl does not do? (Probably fine, but if the approach is extended to debug symbols, which are 0.5GB, this may become an issue)

You can response.read(amt=<number of bytes>) so it could be chunked to avoid loading the whole thing in to memory. That could also be used to add back some coarse progress indicator

Right, but at some point I wonder if it is worth it to try to re-implement curl in Python. I am not sure, but is curl not bundled in git, which is already a dependency of this project? Of all open source software projects curl is currently unlikely to go unmaintained, so the motivation still seems weak.

maflcko · 2024-06-09T09:26:30Z

Also tend toward NACK here, as there are several issues here from a quick glance:

This is more lines of code, so possibly more to maintain.
This worsens the UX, due to lack of progress.
Review and testing this change may not be worth it, considering the limited benefit, if any?

iw4p · 2024-06-09T09:50:33Z

Moreover, the optimization to reduce two requests into one significantly lowers the traffic load on the host,

This isn't true? There are still two requests before and after this change, so this isn't changed. Also a HEAD request shouldn't cause any meaningful traffic, compared to a full download.

Twice I asked if you think it's fine, we can use one request to check the status code (if it's 404) or download (if status code isn't 404) and I got no answer. Right now the code send two requests and I did not change that to one request because we're not aware of what's the reason, but it's clear that one request is enough. It seems I can not change the logic easily and I must continue on the old same logic, otherwise I'd like to write it in that way which is more reliable and general for handling all kind of errors, not only 404.

try:
    response = urlopen(tarballUrl)
    with open(tarball, 'wb') as file:
        file.write(response.read())
except Exception as e:
    print(f"Failed to download file. Error: {str(e)}")
    return 1

maflcko · 2024-06-09T10:34:44Z

Twice I asked if you think it's fine [...] and I got no answer.

As the author of a pull request, there is an expectation that you spend time to review and to test the code you have written yourself. I try to ask questions during review to motivate pull request authors to look into interesting areas and edge-cases to explore in the change. There are currently more than 300 open pull request and I currently don't have the time to test everything I can imagine in all of them. In this case, it is not really that hard, so I did the test myself, see #29970 (comment). So with the current code in master, the two requests are needed. I haven't tested this with your code, so I can't provide you an answer to that.

maflcko · 2024-06-21T07:30:41Z

I think this can be closed for now. It seems too controversial for a simple test change?

iw4p · 2024-06-21T13:33:42Z

I think this can be closed for now. It seems too controversial for a simple test change?

I think so. I'll try to take a look at other issues, I'd be happy if I can help.

maflcko · 2024-06-21T14:03:30Z

Ok, closing for now.

You can also help with review of other open pull requests. This will teach you about the project and naturally you will find leftovers, follow-ups, or (un)related changes that you can tackle yourself.

DrahtBot added the Refactoring label Apr 26, 2024

maflcko reviewed Apr 26, 2024

View reviewed changes

test/get_previous_releases.py Outdated Show resolved Hide resolved

iw4p mentioned this pull request Apr 26, 2024

refactor: refactored platform assignment into get_platform function #29971

Closed

laanwj added Tests and removed Refactoring labels Apr 26, 2024

iw4p changed the title ~~refactor: switch from curl to requests for HTTP requests~~ test: switch from curl to requests for HTTP requests Apr 26, 2024

DrahtBot added the CI failed label Apr 26, 2024

iw4p force-pushed the feature/requests-library-usage branch from 38176a7 to ddd27cb Compare April 26, 2024 17:42

iw4p force-pushed the feature/requests-library-usage branch from ddd27cb to 29e9377 Compare April 27, 2024 05:30

iw4p changed the title ~~test: switch from curl to requests for HTTP requests~~ test: switch from curl to urllib for HTTP requests Apr 27, 2024

iw4p force-pushed the feature/requests-library-usage branch from 29e9377 to 7fe94f7 Compare April 27, 2024 05:34

DrahtBot removed the CI failed label Apr 27, 2024

maflcko reviewed Apr 27, 2024

View reviewed changes

AngusP approved these changes May 10, 2024

View reviewed changes

test/get_previous_releases.py Show resolved Hide resolved

fanquake reviewed May 13, 2024

View reviewed changes

test/get_previous_releases.py Outdated Show resolved Hide resolved

test: switch from curl to urllib for HTTP requests

588cad3

iw4p force-pushed the feature/requests-library-usage branch from 7fe94f7 to 588cad3 Compare May 25, 2024 09:31

iw4p requested review from laanwj, fanquake and maflcko May 25, 2024 12:07

AngusP approved these changes May 25, 2024

View reviewed changes

maflcko reviewed Jun 9, 2024

View reviewed changes

DrahtBot added CI failed and removed CI failed labels Jun 18, 2024

bitcoin deleted a comment from Mazzika1 Jun 20, 2024

maflcko closed this Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: switch from curl to urllib for HTTP requests #29970

test: switch from curl to urllib for HTTP requests #29970

iw4p commented Apr 26, 2024 •

edited

DrahtBot commented Apr 26, 2024 •

edited

laanwj commented Apr 26, 2024

iw4p commented Apr 26, 2024

iw4p commented Apr 26, 2024 •

edited

laanwj commented Apr 26, 2024

iw4p commented Apr 27, 2024

maflcko Apr 27, 2024

laanwj Apr 27, 2024

iw4p Apr 27, 2024

maflcko May 13, 2024

iw4p May 13, 2024

maflcko May 25, 2024

maflcko Jun 9, 2024

AngusP Jun 12, 2024

AngusP left a comment

AngusP left a comment

theStack commented Jun 7, 2024

iw4p commented Jun 7, 2024

maflcko commented Jun 9, 2024

maflcko Jun 9, 2024

AngusP Jun 12, 2024

maflcko Jun 13, 2024

maflcko commented Jun 9, 2024

iw4p commented Jun 9, 2024

maflcko commented Jun 9, 2024 •

edited

maflcko commented Jun 21, 2024

iw4p commented Jun 21, 2024

maflcko commented Jun 21, 2024

test: switch from curl to urllib for HTTP requests #29970

test: switch from curl to urllib for HTTP requests #29970

Conversation

iw4p commented Apr 26, 2024 • edited

DrahtBot commented Apr 26, 2024 • edited

Code Coverage

Reviews

laanwj commented Apr 26, 2024

iw4p commented Apr 26, 2024

iw4p commented Apr 26, 2024 • edited

laanwj commented Apr 26, 2024

iw4p commented Apr 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AngusP left a comment

Choose a reason for hiding this comment

AngusP left a comment

Choose a reason for hiding this comment

theStack commented Jun 7, 2024

iw4p commented Jun 7, 2024

maflcko commented Jun 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maflcko commented Jun 9, 2024

iw4p commented Jun 9, 2024

maflcko commented Jun 9, 2024 • edited

maflcko commented Jun 21, 2024

iw4p commented Jun 21, 2024

maflcko commented Jun 21, 2024

iw4p commented Apr 26, 2024 •

edited

DrahtBot commented Apr 26, 2024 •

edited

iw4p commented Apr 26, 2024 •

edited

maflcko commented Jun 9, 2024 •

edited