Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: switch from curl to urllib for HTTP requests #29970

Closed
wants to merge 1 commit into from

Conversation

iw4p
Copy link

@iw4p iw4p commented Apr 26, 2024

Switched from using subprocess to make HTTP requests (curl) to using the Python urllib library, improving cleanliness and maintainability.

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 26, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK AngusP
Concept NACK maflcko

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

@laanwj
Copy link
Member

laanwj commented Apr 26, 2024

This is a script used for the tests, so adding test tag.

@iw4p iw4p changed the title refactor: switch from curl to requests for HTTP requests test: switch from curl to requests for HTTP requests Apr 26, 2024
@iw4p
Copy link
Author

iw4p commented Apr 26, 2024

The title's tag also changed from Refactor to Test

@iw4p iw4p force-pushed the feature/requests-library-usage branch from 38176a7 to ddd27cb Compare April 26, 2024 17:42
@iw4p
Copy link
Author

iw4p commented Apr 26, 2024

@laanwj I investigated and found that you modified import requests here here. Is there any shell script that I can provide pip install requests to pass the previous release, depends DEBUG?

edit: It seems bitcoin/ci/test/00_setup_env_native_previous_releases.sh is responsible to provides everything that the environment's needed. Shall I add pip install requests here?

@laanwj
Copy link
Member

laanwj commented Apr 26, 2024

Yes, that's usually how it'd be done.

But gah, requests is an external dependency? That's unfortunate, i don't think we should be adding dependencies unless absolutely necessary. Probably better to do this with python's built in urllib or http client functionality, or keep it like this.

@iw4p iw4p force-pushed the feature/requests-library-usage branch from ddd27cb to 29e9377 Compare April 27, 2024 05:30
@iw4p iw4p changed the title test: switch from curl to requests for HTTP requests test: switch from curl to urllib for HTTP requests Apr 27, 2024
@iw4p iw4p force-pushed the feature/requests-library-usage branch from 29e9377 to 7fe94f7 Compare April 27, 2024 05:34
@iw4p
Copy link
Author

iw4p commented Apr 27, 2024

I replaced the requests with urllib, tested the script for downloading .tar.gz file and works fine for me.

['curl', '--remote-name', tarballUrl]
]
try:
response = urlopen(Request(tarballUrl, method='HEAD'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required to check for 404 in a separate, additional request?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how the current code works too - it first does a HEAD request to check that the file is there, then goes to downloading. i don't know the original reasoning behind this, though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I could write more pythonically and with one request, but the reason I wrote it this way was to commit to the previous code. If you agree with one request, I can change it.
Should this change be in a new commit or should I force it on the previous commit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this change be in a new commit or should I force it on the previous commit?

Squashing seems fine?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To rewrite the git history for having the last change on the last commit and ignore old ones, I forced it. I used squashing only for merging before that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this addressed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this myself, and the reason seems to be that absent a HEAD request, a 404.html would be downloaded as the targz, which then fails the shasum check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With curl that makes sense, though here the urlopen will raise a HTTPError on a 404 for both a GET and HEAD request, so the earlier HEAD does seem superfluous?

Copy link
Contributor

@AngusP AngusP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crACK 7fe94f7

test/get_previous_releases.py Show resolved Hide resolved
@iw4p iw4p force-pushed the feature/requests-library-usage branch from 7fe94f7 to 588cad3 Compare May 25, 2024 09:31
@iw4p iw4p requested review from laanwj, fanquake and maflcko May 25, 2024 12:07
Copy link
Contributor

@AngusP AngusP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 588cad3

@theStack
Copy link
Contributor

theStack commented Jun 7, 2024

One drawback of this change is that a user doesn't see download progress per file any more (relevant if e.g. servers are slow or the connection is bad for any other reason; for me this currently takes several minutes per file :/) and hence is left in the dark whether there is still activity, so Concept ~0.

@iw4p
Copy link
Author

iw4p commented Jun 7, 2024

One drawback of this change is that a user doesn't see download progress per file any more (relevant if e.g. servers are slow or the connection is bad for any other reason; for me this currently takes several minutes per file :/) and hence is left in the dark whether there is still activity, so Concept ~0.

I appreciate your feedback. While it's true that switching from curl to urllib removes the per-file download progress visibility, this change offers significant improvements in code cleanliness and maintainability, which are crucial for long-term project sustainability.

Moreover, the optimization to reduce two requests into one significantly lowers the traffic load on the host, enhancing overall performance. Although users might not see detailed progress, the efficiency gains and cleaner codebase justify this trade-off. For those needing more detailed progress feedback, we can consider integrating optional logging mechanisms directly into the code, ensuring we don't rely on third-party libraries like tqdm.

@maflcko
Copy link
Member

maflcko commented Jun 9, 2024

Moreover, the optimization to reduce two requests into one significantly lowers the traffic load on the host,

This isn't true? There are still two requests before and after this change, so this isn't changed. Also a HEAD request shouldn't cause any meaningful traffic, compared to a full download.

try:
response = urlopen(tarballUrl)
with open(tarball, 'wb') as file:
file.write(response.read())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this load the full response into memory, which curl does not do? (Probably fine, but if the approach is extended to debug symbols, which are 0.5GB, this may become an issue)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can response.read(amt=<number of bytes>) so it could be chunked to avoid loading the whole thing in to memory. That could also be used to add back some coarse progress indicator

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but at some point I wonder if it is worth it to try to re-implement curl in Python. I am not sure, but is curl not bundled in git, which is already a dependency of this project? Of all open source software projects curl is currently unlikely to go unmaintained, so the motivation still seems weak.

@maflcko
Copy link
Member

maflcko commented Jun 9, 2024

Also tend toward NACK here, as there are several issues here from a quick glance:

  • This is more lines of code, so possibly more to maintain.
  • This worsens the UX, due to lack of progress.
  • Review and testing this change may not be worth it, considering the limited benefit, if any?

@iw4p
Copy link
Author

iw4p commented Jun 9, 2024

Moreover, the optimization to reduce two requests into one significantly lowers the traffic load on the host,

This isn't true? There are still two requests before and after this change, so this isn't changed. Also a HEAD request shouldn't cause any meaningful traffic, compared to a full download.

Twice I asked if you think it's fine, we can use one request to check the status code (if it's 404) or download (if status code isn't 404) and I got no answer. Right now the code send two requests and I did not change that to one request because we're not aware of what's the reason, but it's clear that one request is enough. It seems I can not change the logic easily and I must continue on the old same logic, otherwise I'd like to write it in that way which is more reliable and general for handling all kind of errors, not only 404.

try:
    response = urlopen(tarballUrl)
    with open(tarball, 'wb') as file:
        file.write(response.read())
except Exception as e:
    print(f"Failed to download file. Error: {str(e)}")
    return 1

@maflcko
Copy link
Member

maflcko commented Jun 9, 2024

Twice I asked if you think it's fine [...] and I got no answer.

As the author of a pull request, there is an expectation that you spend time to review and to test the code you have written yourself. I try to ask questions during review to motivate pull request authors to look into interesting areas and edge-cases to explore in the change. There are currently more than 300 open pull request and I currently don't have the time to test everything I can imagine in all of them. In this case, it is not really that hard, so I did the test myself, see #29970 (comment). So with the current code in master, the two requests are needed. I haven't tested this with your code, so I can't provide you an answer to that.

@bitcoin bitcoin deleted a comment from Mazzika1 Jun 20, 2024
@maflcko
Copy link
Member

maflcko commented Jun 21, 2024

I think this can be closed for now. It seems too controversial for a simple test change?

@iw4p
Copy link
Author

iw4p commented Jun 21, 2024

I think this can be closed for now. It seems too controversial for a simple test change?

I think so. I'll try to take a look at other issues, I'd be happy if I can help.

@maflcko
Copy link
Member

maflcko commented Jun 21, 2024

Ok, closing for now.

You can also help with review of other open pull requests. This will teach you about the project and naturally you will find leftovers, follow-ups, or (un)related changes that you can tackle yourself.

@maflcko maflcko closed this Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants