-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sentmap refactorisation #314
base: master
Are you sure you want to change the base?
Conversation
Thank you for the PR. Am I correct to assume that the benchmark was conducted on a low-latency, high-bandwidth network, but still sees some congestions? I would like to reproduce the numbers you saw. |
Yes, the tests were performed on a very low latency, high throughput network. Here are the results I obtained with qperf on a skylake machine running Ubuntu 18.04, with all default options:
With this change:
|
Thank you for elaborating. I tested this PR using my testbed, saturating a 1Gbps link, by capping the CPU clock frequency of the sender machine to 400MHz. The receiver's CPU was running at much higher clock rate, and there was plenty of idle time. master: 333.1Mbps As can be seen, I did not observe improvement. The approach being adopted by this PR has the following trade-off. It reduces the cost of skipping 1 to 15 entries in sentmap, at the cost of using more memory calling malloc and free more frequently. I wonder if we could eliminate the skips without having the negative side-effect of the latter. That said, I would assume that this optimization would become much less effective once we adopt #319. In case of the testbed explained above, we see the speed go up to 432Mbps, as the number of ACK packets get reduced to 1/3. Would you be interested in checking if you still see this PR makes difference when #319 is used as the base (i.e. compare #319 vs. #319 + this PR)? |
This improves the throughput of quicly by 5-7% in benchmarks. The discard code used to be a bottleneck. This structure allows removing packets from the sentmap in O(1) time.
3d51b09
to
d89d28d
Compare
Thank you for taking the time to try this PR. I ran more tests with and without #319, both with qperf and VPP. Overall I think this deserves more testing at this point, to run tests while controlling more accurately the drop rate and maybe switch the congestion control algorithm to something that can sustain high throughputs even in the presence of drops. I will let you know if I get to spend some time on it. As a side note, I also tried to keep the allocated packets in a freelist in order to reduce the amount of calls to malloc / free, but I did not observe a significant difference. |
Hi Kazuho,
I looked at the performance of Quicly yesterday and noticed a bottleneck in the sentmap at high throughputs. This PR proposes a new structure based on a dlist of packets for the sentmap, which improves the throughput by 5-7% in our tests.