-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Towards a new generic and composable server #11
Comments
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗 |
Thx for opening the discussion.
Does https://github.com/adriendelsalle/fps support multiuser or is it like today single-user jupyter server based on tornado?
Sound good to me.
There are other deployments that don't use jupyterhub. They should also be considered, or at least make sure it is still possible to run with something else than jupyterhub.
I am wondering how this would impact e.g. enteprise gateway cc/ @kevin-bates
Having a dual offering is the less comfortable solution. |
The first goal was to get a working (lean and mean) set of fps plugins for the Jupyter endpoints before we tackle more important refactors. Ultimately, we should decide what is considered a user setting (to be saved on the database) and what is server configuration.
🎉 great!
Yeah it is definitely a complete reboot of the server project. However we should work together and use this occasion to be as close as possible to "the right thing" for everyone. |
Yeah, absolutely. What I had in mind when I wrote this is that all it needs basically is an OIDC provider, and that the Hub is just one instance... We are actually preparing a project for a specific deployment that probably won't be based on JupyterHub so that is definitely something we have in mind... |
My question was more: |
Thx for clarification. What about the support of the existing server extensions https://jupyter-server.readthedocs.io/en/stable/developers/extensions.html (and the shim for the previous notebook server extensions implemented by nbclassic https://github.com/jupyterlab/nbclassic) ? |
Server extensions would need to be adapted to the new extension mechanism. They will typically be implemented as FPS plugins. |
This looks interesting - thank you for raising the discussion! Looking at the kernel_server plugin (in jupyverse) (which I'm assuming is a quick POC), I believe we will need to support variations of kernel launching in order to support resource-managed kernels and that needs to be accomplished in a pluggable manner. Since the kernel provisioning design is predicated on the Popen abstraction, I suspect provisioners can be usable in this new server model and would like to see that be a design goal. I really like the FPS approach and is something we've touched on in the past in Service Composition. Would it be possible to have some form of meeting where basic requirements could be discussed and a roadmap possibly outlined? This seems to be happening fast and I think it would be good for folks to get on the same page before we're too far down the road. |
Absolutely. We would like to discuss it at server meetings and all relevant venues. We can also schedule dedicated discussions.
It is definitely a POC since we need to enable the new WebSocket protocol, etc. |
Sorry to answer later, but yes as @davidbrochart mentioned Basic collision detection has been implemented to prevent multiple declaration of a route, but still a lot has to be done to handle complex scenario: multiple back-ends for a single API, compatibility rules, etc.
This discussion is awesome, I'll take a deep look at it thanks @kevin-bates ! |
Hey @SylvainCorlay, thanks for starting this discussion! Let's chat about it at this week's server meeting. I'll add it to the agenda. |
Just thinking of Realtime, widgets, etc. Would it make sens to have a server not in Python on which the notebooks models are, and not actually propagate the kernel messages all the way to the browser ? I guess in many cases even the widget code could thus also run on server side. It would also make partial rendering on the browser side easier I believe. |
Hi Matthias,
It turns out that yjs now has a native (rust) port with Python bindings so
the yjs fps plugin will eventually hold an instance of the collaborative
model in the backend.
…On Wed, Sep 8, 2021, 22:37 Matthias Bussonnier ***@***.***> wrote:
Just thinking of Realtime, widgets, etc. Would it make sens to have a
server not in Python on which the notebooks models are, and not actually
propagate the kernel messages all the way to the browser ? I guess in many
cases even the widget code could thus also run on server side. It would
also make partial rendering on the browser side easier I believe.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AASJOFUXSEYF4XJO3FEXPF3UA7CR3ANCNFSM5DPHGG7Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Sorry I missed the meeting today (had some stuff come up). My 2c:
|
@bollwyvl thanks for those ideas!
I'm 100% for the spec-as-contract approach, it was part of what we wanted to handle for the generic server of plugins. |
Ah that is good, if that's the case that still let us use Python on the backend, but the fact that we may still want to stop the kernel messages at the server can still be relevant, and use only a Yjs-ish communication backend <> frontend. |
Following up on the Jupyter Server Weekly Meeting from yesterday, should we start moving these two repositories to the We can leave that question open for a week, and if nobody objects proceed with the move. |
Thanks for opening this discussion Sylvain. Yeah, in response to @Carreau's point:
I made some progress for an MVP along these lines in the jupyterlab-rtc repo (https://github.com/jupyterlab/rtc/pull/73), by storing notebook and kernel model state on the server and only pushing needed information to the clients to synchronize state, not all messages. But looking at this proposal, it seems to be aiming for a smaller scope, not changing the APIs significantly. Obviously, providing a GraphQL backend that stores all notebook state would be a larger change than this one, and come with different tradeoffs. |
Yes, this has come up a few times (e.g jupyter-server/jupyter_server#518, https://discourse.jupyter.org/t/sustainability-of-the-ipynb-nbformat-document-format/3051). On the "spec" side, a concept we've chewed on is a "dumb" spec repo which exists solely to:
At present, one could consider several of the end-user applications as bundles of capabilities that could be documented by a composable spec e.g.
So a preflight- or run-time, a plugin would tell the notional composable server application that it provides some spec (e.g.
For a bit of history, @saulshanabrook and I have been down this road a couple times. GraphQL is super duper grand for many reasons. If we do everything right, an SDL would be one of the (outputs|inputs|tests) we could (generate|consume|validate) in the spec process... but, much like for OpenAPI and OpenAsync, I've imagined it existing external to the (reference) implementation. |
Thanks for suggesting these projects. I like FastAPI quite a bit and see advantages there. I've opened two PRs one on each repo to better clarify the fact that these projects (jupyverse and fps) are experimental today. |
As a process question, have we dropped the use of incubator project status completely at this point? |
+100 on this. Also, it should enable the possibility to make "remixes" of plugins from different projects. |
Could the top comment be updated with a section on "implications for jupyter sub projects"? As someone unfamiliar with the server logic, it is hard for me to understand whether this proposal affects jupyterhub (or lab, or any other part of the community or our stakeholders). |
Beyond the consequences for the currently supported use cases, a big part of this proposal was driven by requirement arising from RTC, and scalable cloud deployments. For both reasons, we need the "single-instance" server to be able to serve multiple users with their individual preferences (such as theme choices, workspace layout, and others) served on a per-user basis rather than from the global server configuration on the file system. On the hub side, I don't think there is so much consequence for the already supported use cases at this point (and David was able to use jupyverse with Binder). For the newer scenarios with respect to RTC etc, the articulation of hub with the single-instance-server (for multiple users collaborating on the same document) is something that has being discussed a lot at the public meetings (in both the server team and the RTC meetings) in the past year. The need to reboot the single user project has become more and more evident beyond the need to drop Tornado and to address the performance issues discussed here. |
@willingc - I don't think that the incubation process is strictly necessary here - that process was largely designed to have an easy "public sandbox" where mostly corporate teams could be allowed by their internal/legal structure to contribute in the open, before something was part of Jupyter (but not doing it inside of a corporate-branded repo). In the incorporation doc we clarify that going through incubation isn't strictly necessary, it's really the other criteria listed there around sustainability, maintenance, fit-in-scope, etc, that are requirements. What I do think would be useful here, given the scope of this idea, would be to turn the top-comment into a JEP that is more easily tracked and discussed as part of the regular JEP process. I'm personally super interested in this idea, I think we're seeing the limits of the existing model in many places and this development seems like a wonderful direction to go into. But for something that will ripple throughout quite a bit of our ecosystem, having a single document to refer to later on, where the various tradeoffs are all listed in one place with a clear structure, will be immensely useful. It will help hash out the impact of these tradeoffs, see potential new ones that might not have been apparent at the start, and ultimately create team buy-in for the decision. I think this can be made a JEP with very minimal effort (the start can be simply the above top comment, refined with the lessons from this discussion), and that it would help reach more people to get a good discussion/decision. How does that sound @SylvainCorlay et al? |
@fperez Sounds reasonable. A JEP would give more visibility to more of the community. |
The reason why we went with the team compass instead as the JEP repository is because JEPs seem appropriate for a precise scoped proposal on which people could vote. Those we worked on in our team (Debugger Protocol, Voilà Incorporation, Jupyter Server Split, XPUB Sockets, Kernel Handshaking, ZMQ Identity for Router sockets) were all fairly well specified when we submitted them, while some of the discussion items here still are open questions (like the enhanced WebSocket protocol). I am OK with reposting the content of this discussion as an issue in the enhancement proposals repository, even though it may not converge to a proper proposal that people can vote on. I can also post a message on discourse and the mailing list pointing to this issue for attention. Let me know what you prefer. |
Whether to consider using ZeroMQ WebSocket Protocol 2.0 ? It now works with pyzmq. |
Interesting, we'll have a look at it, thanks! |
Thanks for opening this, @SylvainCorlay. I'm a big FastAPI fan. What would the compatibility story for existing server extensions? I think that must be a core part of any new design. Extensions are just now very slowly getting compatible with jupyter-server, it isn't really practical to drop support for notebook server yet. So a really good compatibility story baked into the design from the start would be very important |
Just a general note on JEP utility from my perspective: My personal belief is that JEPs are not only for making decisions on narrowly-scoped proposals. The JEP process is a mechanism for gathering broad feedback from the stakeholders in the Jupyter ecosystem, signal boosting really important questions, and ensuring that many perspectives have a chance to participate in brainstorming and decision-making. So in my opinion, situations where there is a complex decision to be made, with large implications for the community, but an unclear path forward, is a great case for a JEP (or maybe a "pre-JEP proposal issue" as a start). Over time, the discussions in the JEP process can help decision-makers/maintainers/etc arrive at a proposal that is aligned with the stakeholders that have provided feedback, and we can converge on a path forward that is more specific and can be "voted on". But I think that if we restrict JEPs only to specific proposals that are ready for a vote, we'll miss a valuable opportunity for the broader community to provide feedback. |
Looks like discussion here has been stagnant for about 8 months, whereas https://github.com/jupyter-server/fps appears to be in active development. The readme at fps links here for understanding project motivations: is there maybe a more current document that describes how fps fits into the broader jupyter roadmap? |
Not at the moment. |
Is there any more detailed info/discussions about this one, @davidbrochart? |
JupyterHub is basically a server of servers. Each individually spawned Jupyter server is quite independent. |
Interesting...
serving what... what would be the entity you are spawning? Kernels? Something else? |
@damianavila let's continue the discussion in Jupyverse. |
Hi, is there any timeline for replacing jupyter_server with jupyverse? It is a great idea to replace tornado with fastAPI (but it might also be super difficult since all server-extension would break) |
I don't think there will be a time where we officially switch to Jupyverse. Both projects will likely coexist, and users will use the server they prefer depending on the supported features/extensions. I think projects like JupyterLab should at least optionally depend on jupyter-server, so that Jupyverse can be installed without pulling jupyter-server and all its dependencies (see jupyterlab/jupyterlab#11101). |
Thanks a lot! Would try fast API based server after 4.2 |
The current
jupyter_server
project started as a split of the backend parts of thenotebook
repository, and the classic notebook front-end is now installable as a separate package providing a server extension https://github.com/jupyterlab/nbclassic, and JupyterLab has adopted the new package.While jupyter_server has changed quite a bit from the original notebook backend, it still includes a lot of the history of the original project.
Problems with the current server
Tornado
IIRC, Tornado was one of the earliest Python web servers to support WebSockets, and provided a modern async programming model long before asyncio existed. It was adopted broadly in the Jupyter stack, so much that e.g. ipykernel and jupyter_client depend on Tornado…
However, in my opinion, Tornado has become a liability:
Dropping Tornado and building a new server on top of another stack would be a complete reboot of the project - and would not allow any existing server extension to be used with it.
The current HTTP and WebSocket APIs
HTTP endpoints
The current HTTP endpoints could be improved in several ways. For example, we could work on
The API could also handle certain long-running requests differently, for example by returning immediately with a token that can be used to poll another endpoint for the result.
The kernel protocol over WebSocket is inefficient
Another issue with the way the Jupyter server works is due to the way we communicate with kernels over WebSockets. The main issue in my opinion, while ZMQ messages are serialised in a well-specified sequence of blobs of bytes,
the WebSocket protocol communicates this content as a JSON object with keys for
header
,content
,metadata
etc. A consequence of that design is that all messages have to be parsed so that we can recompose the ZMQ messages.If the WebSocket messages contained the same binary blobs as the ZMQ messages, we could directly route them to the right kernel (simply adding ZMQ identities and delimiter)... Such an approach would result in a considerably faster handling of kernel messages.
A multi-user "single instance" server
In cloud deployments, (especially with the new RTC features of JupyterLab), we will probably want have some preferences (currently configured via traitlets configurables) to become user-specific and be saved in a data base.
For examples, themes, workspaces in JupyterLab should probably be set on a per-user basis.
A proposal for a new server
Drop Tornado and reboot the Jupyter server project with a FastAPI-based solution.
Using FastAPI will come with many benefits such as modern tooling (type annotation, automatic generation of OpenAPI specs, a rich collection of tools for telemetry, authentication.
Adopt the “everything is a plugin” approach of JupyterLab to the architecture
Starting from an "empty" base server and a collection of plugins for HTTP endpoints may have important benefits compared to the current approach where the base server provides a number of endpoints already.
Prototype implementation
In the past few weeks, @adriendelsalle and @davidbrochart have been working on prototyping such an approach
There is still a lot to figure out naturally:
We've had several conversation on whether we would like e.g. the base server to always require a database, and plugins to require some tables etc in this database. For plugins having special requirements, they could always require another one...
Presumably, SQLite could be used for the case of a single machine deployment of Jupyter where the users simply types
jupyter lab
to launch it, but a database running on a separate machine would presumably be specified in the case of cloud deployments. (Using the same database for several plugins would help simplify the configuration).When discussing the project, we have also been thinking about the articulation between the single-instance server and the hub with respect to authentication and authorization. One idea that came out was to "elect" OIDC as the default authentication method, and use the hub as an OIDC identity provider in the case of hub-based deployments.
And naturally the whole question of the transition should we move forward.
Jupyter_server
We would like to discuss those ideas with the broader
jupyter_server
community, and improve the proposal and the ongoing work based on the group's feedback.Obviously, we should continue improving
jupyter_server
, but there are many projects that could already benefit from the proposed approach.The text was updated successfully, but these errors were encountered: