Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use http/2 and keepalives #1378

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

technillogue
Copy link

@technillogue technillogue commented Jul 18, 2023

Details

We noticed connections were being dropped frequently. This adds rustls, so that http/2 ALPN works. To benefit from this more, it also sets tcp_keepalive and a connection pool. I'm not attached to the specific values for those.

It also adds a little bit of logging on throughput, which I'm happy to discard if it doesn't fit.

Types of changes

What types of changes does your PullRequest introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation Update (if none of the other choices apply)

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.

@technillogue technillogue requested a review from a team as a code owner July 18, 2023 23:03
@technillogue technillogue requested review from liubin, luodw and adamqqqplay and removed request for a team July 18, 2023 23:03
@anolis-bot
Copy link
Collaborator

@technillogue , a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/85448

@anolis-bot
Copy link
Collaborator

@technillogue , The CI test is completed, please check result:

Test CaseTest Result
build rust golang image✅ SUCCESS
compile nydusd❌ FAIL

Sorry, your test job failed. Please get the details in the link.

@anolis-bot
Copy link
Collaborator

@technillogue , the title has been updated, so a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/85449

@anolis-bot
Copy link
Collaborator

@technillogue , The CI test is completed, please check result:

Test CaseTest Result
build rust golang image✅ SUCCESS
compile nydusd❌ FAIL

Sorry, your test job failed. Please get the details in the link.

Copy link
Member

@adamqqqplay adamqqqplay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix broken CI first, thanks.

@@ -626,14 +626,19 @@ impl Connection {
} else {
None
};
// get pool size from envvar
let pool_max_idle_per_host = match env::var("REGISTRY_CLIENT_POOL_MAX_IDLE_PER_HOST") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these idle connections take pressure on the registry server? Especially when the container is started and prefetch work is done. Maybe we can make the value as the one option of ConnectionConfig.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would make a significant impact for reasonable values. Most servers should close idle connections if it becomes a problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making the value as the one option of ConnectionConfig?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to fetch REGISTRY_CLIENT_POOL_MAX_IDLE_PER_HOST from configuration.

@imeoer
Copy link
Collaborator

imeoer commented Jul 19, 2023

@technillogue Thanks for the PR! It seems to have some compilation errors that still need to be solved.
Does this change improve your bandwidth utilization?

@anolis-bot
Copy link
Collaborator

@technillogue , the code has been updated, so a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/85469

@anolis-bot
Copy link
Collaborator

@technillogue , The CI test is completed, please check result:

Test CaseTest Result
build rust golang image✅ SUCCESS
compile nydusd❌ FAIL

Sorry, your test job failed. Please get the details in the link.

@technillogue
Copy link
Author

Does this change improve your bandwidth utilization?

It was a small improvement. It made less of a difference than changing merging_size, but still helped.

@anolis-bot
Copy link
Collaborator

@technillogue , the code has been updated, so a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/85490

@anolis-bot
Copy link
Collaborator

@technillogue , The CI test is completed, please check result:

Test CaseTest Result
build rust golang image✅ SUCCESS
compile nydusd❌ FAIL

Sorry, your test job failed. Please get the details in the link.

@codecov
Copy link

codecov bot commented Jul 19, 2023

Codecov Report

Merging #1378 (dfa8491) into master (9ab1ec1) will increase coverage by 0.01%.
The diff coverage is 64.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1378      +/-   ##
==========================================
+ Coverage   46.28%   46.29%   +0.01%     
==========================================
  Files         123      123              
  Lines       38872    38883      +11     
  Branches    38872    38883      +11     
==========================================
+ Hits        17990    18000      +10     
- Misses      19908    19909       +1     
  Partials      974      974              
Files Coverage Δ
storage/src/backend/connection.rs 43.47% <85.71%> (+0.47%) ⬆️
storage/src/cache/mod.rs 50.67% <42.85%> (+0.09%) ⬆️

... and 3 files with indirect coverage changes


let mut cb = Client::builder()
.timeout(timeout)
.connect_timeout(connect_timeout)
.redirect(Policy::none());
.redirect(Policy::none())
.use_rustls_tls()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why use_rustls_tls() here? It introduces heavy dependencies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using native-tls instead? We already have vendored openssl-sys crate.

openssl = { version = "0.10.55", features = ["vendored"] }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's required for http/2, as far as I know native-tls does not provide for the ALPN required for http/2

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be acceptable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be acceptable I think.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I make any other changes to this then?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about making the env REGISTRY_CLIENT_POOL_MAX_IDLE_PER_HOST value as the one option of ConnectionConfig?

@anolis-bot
Copy link
Collaborator

@technillogue , the code has been updated, so a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/86935

Copy link
Member

@adamqqqplay adamqqqplay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current commit log looks a bit messy, it would be nice if you could reorganize them.

@anolis-bot
Copy link
Collaborator

@technillogue , the code has been updated, so a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/87687

@technillogue
Copy link
Author

@adamqqqplay, I've reorganized the commit log

@anolis-bot
Copy link
Collaborator

@technillogue , the code has been updated, so a new test job has been submitted. Please wait in patience. The test job url: https://tone.openanolis.cn/ws/nrh4nnio/test_result/87688

@anolis-bot
Copy link
Collaborator

@technillogue , The CI test is completed, please check result:

Test CaseTest Result
build rust golang image✅ SUCCESS
compile nydusd✅ SUCCESS
compile ctr remote✅ SUCCESS
compile nydus snapshotter❌ FAIL

Sorry, your test job failed. Please get the details in the link.

@anolis-bot
Copy link
Collaborator

@technillogue , The CI test is completed, please check result:

Test CaseTest Result
build rust golang image✅ SUCCESS
compile nydusd✅ SUCCESS
compile ctr remote✅ SUCCESS
compile nydus snapshotter✅ SUCCESS
run container with rafs✅ SUCCESS
run container with zran✅ SUCCESS
run container with rafs and compile linux✅ SUCCESS

Congratulations, your test job passed!

@technillogue
Copy link
Author

Any update on this?

@imeoer
Copy link
Collaborator

imeoer commented Aug 9, 2023

Any update on this?

How about making the env REGISTRY_CLIENT_POOL_MAX_IDLE_PER_HOST value as the one option of ConnectionConfig?

@@ -626,14 +626,19 @@ impl Connection {
} else {
None
};
// get pool size from envvar
let pool_max_idle_per_host = match env::var("REGISTRY_CLIENT_POOL_MAX_IDLE_PER_HOST") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to fetch REGISTRY_CLIENT_POOL_MAX_IDLE_PER_HOST from configuration.

.redirect(Policy::none());
.redirect(Policy::none())
.use_rustls_tls()
.tcp_keepalive(Some(Duration::from_secs(5 * 60)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants