Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using asyncio directly in TCP #4513

Closed
jakirkham opened this issue Feb 17, 2021 · 12 comments · Fixed by #5450
Closed

Using asyncio directly in TCP #4513

jakirkham opened this issue Feb 17, 2021 · 12 comments · Fixed by #5450

Comments

@jakirkham
Copy link
Member

As part of the Scheduler benchmarking work, we have seen a fair bit of time still spent in Tornado. While we can now use uvloop ( #4448 ) and have optimized TCP communication ( #4506 ), this still remains. In figuring out how we might fix this, have done some reading on asyncio's Transports and Protocols, which uvloop in turn uses to provide significant performance gains.

It seem doable (though may be a bit of work) to reimplement TCP communication using this strategy, which would also allow us to benefit from uvloop's performance improvements in this area. Hoping after we do this with TCP, we can apply something similar to UCX as well (though this may need a bit of generalization).

Am curious if this is something we would be open to. Any thoughts? 🙂

cc @mrocklin @quasiben

@mrocklin
Copy link
Member

I'm totally open to this. I had an asyncio Comm PR a while back, but it stalled. This may not be what you're referring to with transports and protocols though. @jcrist has done some work here in side projects (I think https://github.com/jcrist/ery) and may have thoughts.

However, after looking again at profiles recently I think that extract_serialize is probably our biggest bottleneck on the comms side (or perhaps overall). Getting that done should be, I think, our first priority.

@jakirkham
Copy link
Member Author

Good to know. Yeah it would be great to get Jim's and others thoughts here

IIUC that is something @madsbk is looking into

@jakirkham
Copy link
Member Author

Though it sounds like Jim has also done work on the serialization side ( https://github.com/jcrist/msgspec ). Am wondering if it makes sense to discuss how we could (optionally) use Jim's work on both these fronts in Distributed. Maybe it's too early to ask that question, but since we know these are bottlenecks we'd like to address it seems worth surfacing

@mrocklin
Copy link
Member

My inclination is to wait on these decisions until we get extract_serialize done. For example, if Mads is occupied this week then I wonder if extract_serialize is something that you could take over. If Mads is doing it and if it's on track to be done soon then great, let's figurre out the next largest bottleneck. The sense I've gotten recently is that Mads might be doing a few things, which is why I bring this up.

I like @jcrist 's projects and I have full confidence that they're excellent implementations down an interesting path. They also seem like integrating/developing them fully might take a while, and so I hesitate to have anyone start to invest time before seeing if this is the next largest bottleneck. I plan to do some more profiling tomorrow morning and maybe I'll have some thoughts to share after that?

@madsbk
Copy link
Contributor

madsbk commented Feb 18, 2021

I have time to work on extract_serialize next week but @jakirkham you are of cause welcome to take over :)

@mrocklin
Copy link
Member

From my personal perspective accelerating extract_serialize by a week is more important than exploring asyncio protocols.

I ran a brief experiment this morning with master and with cythonization turned on. That's here: #4443

@jakirkham
Copy link
Member Author

I responded there. Anyways this seems like an off topic discussion that should move to a different issue

@jakirkham
Copy link
Member Author

Matt shared some asyncio work that he did previously with Ben, James, and myself. Here's the PR ( #2165 ) for reference.

@jakirkham
Copy link
Member Author

jakirkham commented Mar 5, 2021

Also hoping that once we adopt this we can start to use it to standardize alternative communication layers around this strategy.

@jakirkham
Copy link
Member Author

@douglasdavis, if you are looking for things to pick up related to the Scheduler work, this would be a good one. We already know communication is slow ( #4443 ). Also this would allow us to use UNIX domain sockets ( #3630 ) when available. Additionally could be extended to handle shared memory-based communication ( #2046 ), MPI ( #4461 ), etc.

@jakirkham
Copy link
Member Author

Somehow missed issue ( #2162 ) also discusses a very similar idea when opening this. Though I guess that was looking at asyncio as an alternative as opposed to replacing tornado completely as mentioned here.

One thing worth noting is Matt's existing work on this from before ( main...mrocklin:asyncio-comm ). Guessing things have changed a bit over the years, but maybe that is still a useful starting point.

@mrocklin
Copy link
Member

The PR is at #2165 , which also includes some context. That PR stalled because it asyncio servers needed to be awaited and we didn't await distributed.comm.Listeners at the time. We do now, so things should be easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants