Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DGX Nightly Benchmark run 20210217 #109

Open
quasiben opened this issue Feb 17, 2021 · 4 comments
Open

DGX Nightly Benchmark run 20210217 #109

quasiben opened this issue Feb 17, 2021 · 4 comments

Comments

@quasiben
Copy link
Owner

Benchmark history

Benchmark Image

Raw Data

<Client: 'tcp://127.0.0.1:36573' processes=10 threads=10, memory=540.94 GB>
Distributed Version: 2021.02.0+7.g383ea032
simple
5.552e-01 +/- 4.505e-02
shuffle
2.322e+01 +/- 8.996e-01
rand_access
1.058e-02 +/- 6.584e-04
anom_mean
1.141e+02 +/- 2.758e+00

Raw Values

simple
[0.55093336 0.56203961 0.53239679 0.54476047 0.60462093 0.53786802
0.53799701 0.56506395 0.46722174 0.64862871]
shuffle
[23.48259377 23.3623848 21.50754213 23.67603922 22.89431787 24.37197256
24.03984976 21.67854238 23.82202053 23.35824418]
rand_access
[0.00939989 0.00964594 0.01096702 0.01064014 0.01120782 0.0106523
0.01046085 0.0117898 0.01062775 0.01038837]
anom_mean
[112.48733354 113.56297135 114.61307716 113.53013754 114.78903866
113.29076409 112.27411127 110.57477736 114.84757924 121.52652001]

Dask Profiles

Scheduler Execution Graph

Sched Graph Image

@jakirkham
Copy link
Collaborator

Looking at the shuffle profile, this zooms in a bit (though not that much really) to show how much time is spent in write and extract_serialize respectively. There is one other write call, which is not as large (though still larger than extract_serialize). There are a couple read calls that also take a fair bit of time with similarly small from_frames associated with them.

Screen Shot 2021-02-17 at 7 13 45 PM

Screen Shot 2021-02-17 at 7 13 35 PM

cc @mrocklin

@mrocklin
Copy link
Contributor

Hrm, that's odd. In general I feel like all of the profiling technologies are good at identifying different kinds of activities. I notice in the tree view above that extract_serialize has the largest percentage (1.5%) of any leaf node.

@jakirkham
Copy link
Collaborator

Yeah it's interesting. Not saying the Dask profile necessarily tells the full story either.

Something else interesting is the socket send and recv calls take around 0.6% in the call graph, which differs from what we see in viztracer. I wonder if we are missing something here or if there are limitations of each of these tools, which we need to factor in somehow. Antoine seemed to allude to that here ( dask/distributed#4443 (comment) )

Ignoring that for a moment, if we look at the read portion of the call graph, we see read_bytes takes 1.63% and read_into takes 1.81%, walking all the way down these branches to their leaves recv_into takes 0.67% and isinstance takes 0.59%. Subtracting the leaves from these base read_* functions, we find a total time of 2.18% is accounted for there. This is also larger than from_frames at 1.35% also on the read branch.

Agree on the write side extract_serialize seems to be the dominating component. Whereas on the read side various Tornado functions seem to be the dominating component. So at least from the call graph things seem to be balanced between Tornado and serialization overhead. Though admittedly other profiling strategies seem to be showing one or the other as a larger contributor. Thus far my working theory is these are more equal as the call graph would show, but I could be wrong about this.

@mrocklin
Copy link
Contributor

🤷 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants