-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't pam_mount
a cifs
volume.
#22
Comments
I have been digging a bit more in the issue, reading It seems Which sadely seems to confirms that the only way to forward the auth_token to pam_mount is by opening a session while the first authenticate pam transaction is opened. |
Our PAM session lifecycle in JupyterHub is definitely wrong for nontrivial setups where other operations are tied to the PAM session. The right thing to do is make all PAM calls in a subprocess which should be the parent of the Spawner process, as described here, passing the handle around properly, though that's pretty complicated to fit into JupyterHub's design where login and spawn are often significantly separated by hours or days or processes (Authenticator.refresh_user can be leveraged for this). If you're interested in putting in some time to figuring that out (and/or producing an isolated test case that others can run easily), that would be wonderful, and I can help with some guidance on the relevant pieces of JupyterHub. |
Thank you for your answer, I've gone through the linked issue. Just to make sure I understood your point. You think that the fact that PAM sessions are created in a separate process is the root cause of my issue ? Reading the issue, it does seem so but I haven't tried the fix proposed in the issue. I can put some time in, maybe first by trying to give you a reproducible setup. Although that will most likely be quite complicated in itself. I'm thinking of maybe a docker compose file with several services (jupyterhub, a dummy ldap setup, and somehow a network share ❔ and an initialization script). Would that be okay or would you prefer something else ? |
Ok so I've built a minial setup to reproduce the issue. You can find it here https://github.com/Timost/jupyter-pam-bug-demo. The issue is the same with this setup than with my "real" use case though so I think it's a good sandbox env. |
That's a big part of it, especially in certain SELINUX cases where the calling process is modifed (subsequent calls to open_session fail with different users because the jupyterhub process has lost the necessary permission!) - it also just happens at the wrong times except in the simplest cases. That test sandbox will be very helpful, thanks! |
No problem, thank you for your time ❤️ How can I help moving this forward ? Do you think that's the right way of doing it ? |
Yes, prototyping a custom PAM Authenticator subclass sounds right. When you have something that works, we can see if it makes sense to upstream it to the default implementation or keep it standalone. |
Hi,
As far as I understand these problems at the moment they all are caused by faulty PAM usage in pamela (and to some extent also by JupyterHub). Before I go on: Sorry for this lengthy post, but the problem is not trivial and requires some more explanation. What JupyterHub does if
What pamela does:
What pamela (or JupyterHub) SHOULD do:
Calling As far as I understand, the PAM handle returned by Above mentioned segfault is a result of closing a PAM session in a PAM context in which we never opened a session. Of course, this should not result in a segfault, but in an error message. I did not dig deeper in this direction. It segfaults somewhere in libc. From the pamela side of view it would be nice, if
I've tested this and it works. BUT if the hub user stops and restarts its (single-user) server or if the user spawns multiple servers, then it fails (segfault when stopping the REstarted server). The reason is that calling How to properly handle PAM sessions and multiple servers in JupyterHub? I don't know because I do not have an overview of how authentication/login to the hub and spawning servers interact. Both processes are more or less independent from each other. There are at least two approaches:
I'm willing to help solving the PAM/JupyterHub issues. Primarily because I need this setup, secondarily to contribute my 50 cents to the wonderful open source universe. But neither I'm a software developer nor an admin. Just a guy you wants to use JupyterHub for some data science related teaching and has to setup the server in lack of an up-to-date IT department. Many thanks in advance for each improvement of JupyterHub and Co.! Best regards, jeflem |
Hi @jeflem, Regarding the two approaches you suggest, I think the second one with a re-authentication mechanism is favoured, but @minrk should have much more insight on that than me. This is all theoretical, I haven't started on a new authenticator yet, but I've been reading as much doc/code as I can. The |
Thanks for the hint to Now I had a closer look at JupyterHub's source, in particular auth.py: There is only one common PAMAuthenticator object for all users of the hub and for handling all (repeated) logins. So its a bad idea to store the PAM handle in the authenticator object (we cannot use one PAM handle for all users). But each single-user server has its own Spawner object. So storing user name and password in the Spawner objects could be an approach. Then in Spawner.start one could call PAM authentication, open the PAM session, store the handle. In Spawner.stop the PAM session gets closed. So we do not have to touch the single-user server. Username/password stay inside JupyterHub. As far as I understand at the moment, transfer of username/password from |
Update: BUT if one opens multiple PAM sessions in parallel with independent PAM contexts (spawn multiple servers), then only one of the session can be closed. Closing a second session leads to segfault in Here is a short Python script for testing: import pamela as p
user = 'someuser'
service = 'pamtest' # use 'login' or create custom service /etc/pam.d/pamtest
pw = 'somepassword'
h1 = p.authenticate(user, pw, service, close=False)
h1.open_session()
h1.close_session()
h2 = p.authenticate(user, pw, service, close=False)
h2.open_session()
h2.close_session()
p.pam_end(h2, 0)
p.pam_end(h1, 0) Calling To confirm that this is not a pamela issue I tested this with an equivalent C program directly calling libpam. Segfaults, too. Maybe a bug in |
Creating the two pam sessions from two different python processes doesn't cause the segfault I think. # python process 1
import pamela as p
user = "jupyterhub"
pw = "admin"
service = "jupyterhub"
h1 = p.authenticate(user, pw, service, close=False)
h1.open_session()
# h1.close_session() can also be called here but I'm not sure that mimics the actual behaviour
# python process 2
import pamela as p
user = "jupyterhub"
pw = "admin"
service = "jupyterhub"
h2 = p.authenticate(user, pw, service, close=False)
h2.open_session()
h2.close_session()
p.pam_end(h2, 0)
# python process 1
h1.close_session()
p.pam_end(h1, 0) My current mental model is that the spawners run in separate processes so I'm not sure if this is an actual issue. |
There are (at least) two variants for opening/closing PAM sessions:
From my very limited understanding of JupyterHub's structure the first variant is preferable, because all the authentication stuff is done by JupyterHub. For the second variant each single-user server (I think, there exist different ones?) would have to implement PAM session handling. |
Becomes a bit off-topic, because its not really a pamela issue (see previous comment). In my opinion there are two problems:
Further digging revealed the cause of the segfault: So several parallel PAM transactions share some memory with pam_mount related data and I'm not sure whether this memory sharing may break other functionality, too. As far as I understand from the source code, only username and auth token are managed on a per PAM handle basis. Others (list of mounted volumes for instance), are global. There are lots of bug reports out there about umounts not taking place at logout... Since I do not have any experience with software development, bug reporting and so on, I'm not sure what to do now?
Any suggestions? |
Update:
|
Reading and thinking about the JupyterHub/PAM issue brought me to the conclusion that fixing JupyterHub's PAM session handling in a way which makes pam_mount.so work correctly without fixing the above mentioned pam_mount bug is impossible. Thus, I stop my research here and try to find a solution for my JupyterHub setup that works without pam_mount. For the record here comes a list of approaches investigated by others or me together with reasons of failure: Two-level approach of jupyterhub/jupyterhub#2321 in 2018 The pull request tries to open and close a PAM session in an intermediate process, which is started by jupyterhub and itself starts the single-user server. The pull request died due to unexpected complexity and side effects. Even if one would implement this approach and solve all problems discussed in the pull request, one would have to pass the user's password somehow to the intermediate process. As far is I understand the mechanics of process management, this means either storing the clear text password in an environment variable or passing it as a (command line) argument to the process or maybe via some file on disk. This sounds very dubious regarding password security. Maybe some encryption could be used similar to JupyterHub's Note, that the discussion in the pull request does not take into account this password issue. The pull request is not aware of the fact that opening a PAM session (containing pam_mount.so) requires prior authentication within the same PAM transaction, that is, within the intermediate process. Approach implemented by @gatoniel in 2020 Reading the source code (not testing it) I think that this implementation does not work if pam_mount.so is part of the PAM stack (which is the reason for wanting PAM sessions to work with JupyterHub). Opening and closing PAM sessions is done in the jupyterhub process. So pam_mount.so will segfault if there are more than one users and/or servers. Also opening the PAM session is done in PAM transaction without authentication. So pam_mount will prompt the user for a password, but the user won't recognize this. Parallel server and PAM handling processes For each single-user server start a parallel PAM handling process. It's similar to the pull request approach above, but less complex. Passing the user password to the PAM handling process is not trivial but required. Although for pam_mount such a parallel process would suffice, I'm not sure whether other PAM modules fail if the session is started by a different process and not by the server itself. What about pam_selinux.so for instance? Lack of knowledge... PAM handling by single-user server We could tell the single-user server to open a PAM session. But how to pass the password? Do we really want to do PAM/authentication stuff in the server? The server may also run in non-PAM environment, so shouldn't have to care about PAM session handling. JupyterHub manages PAM sessions in spawners At each spawn a PAM session is opend by JupyterHub and closed after the server stopped. Sadly, this fails due to the bug in pam_mount.so (does not support multiple PAM transactions within one process). But would solve several other problems:
|
Sorting and cleaning my notes taken the past days I found further old issues related to the JupyterHub/PAM problem, just for the record: |
Thanks for detailing all that!
There already is an issue on JupyterHub to track this: jupyterhub/jupyterhub#2973 Spawner/authenticate lifetime is also a wrinkle, because login != spawn and stop != logout:
This does all point to turning off open_sessions by default (I've been meaning to for a long time, and should have done it in 2.0) because it's only a subset of cases where it does work: jupyterhub/jupyterhub#3787 I think it is clear that PAM session calls should not be made from the JupyterHub process. And session should receive an authenticate handle, which cannot be serialized, so authenticate must also be called from another process (at least when using sessions), and due to possible rlimit calls and friends, the session calls must be made in the parent of the server (it must not be the server process itself). So I think it should work something like this:
So the PAM child is roughly: def pam_child(username, password):
pam_h = pamela.authenticate(username, password, close=False)
# propagate error if fails
wait_for(parent exit _or_ popen input)
if parent_exited:
return
pam_h.open_session()
with Popen(popen_args) as child:
send_to_parent(child.pid)
pam_h.close_session()
pamela.pam_end(pam_h) # there should be a pam_h.end(), but there isn't yet This doesn't satisfy the first condition - that restarting the hub shouldn't force re-login, but it greatly simplifies the lifecycle because we don't have to re-establish connections to the PAM children on Hub restart. It also means that Spawning via the API in general won't work if sessions are in use. But the only way I see that possibly working is by storing passwords persistently, which I think we really shouldn't do. Running this test script: test scriptimport getpass
import os
import time
import pamela
def parent(level="session"):
username = getpass.getuser()
password = getpass.getpass()
# once: reuse pam handle across subprocesses
if level == "once":
pam_h = pamela.authenticate(username, password, close=False)
children = []
for i in range(3):
print("starting", i)
session_pid = os.fork()
if session_pid == 0:
if level == "session":
pam_h = pamela.authenticate(username, password, close=False)
# child for each session
for j in range(3):
print("opning", i, j, os.getpid())
pam_h.open_session()
print("opened", i, j, os.getpid())
child_pid = os.fork()
if child_pid == 0:
print("worker", i, j, os.getpid())
time.sleep(5)
print("worked", i, j, os.getpid())
return
else:
os.waitpid(child_pid, 0)
print("closing", i, j, child_pid)
pam_h.close_session()
print("closed", i, j, child_pid)
if level == "session":
pamela.pam_end(pam_h, 0)
return
else:
print("started", session_pid)
children.append(session_pid)
time.sleep(1)
for session_pid in children:
print(session_pid, os.getpid())
os.waitpid(session_pid, 0)
if level == "once":
pamela.pam_end(pam_h, 0)
if __name__ == "__main__":
parent() suggests the following information: that pam authenticate and session open/close must be called in the same process, which in turn suggests that a new authenticate call must be made for each session/spawn to produce a new handle (reopening a session after closing it may work, as it does in my test, but I don't know if some implementations may disallow that). |
Thank you @minrk for developing a possible path to fixing this. Sounds good, although I do not understand all the details concerning processes and their communication (always writing single process apps with less than 100 lines of code...). Would be great to have PAM sessions in JupyterHub some day. For the time being I stick to telling my users "login via ssh if you want to see your Windows shares in Jupyter". Worked for two years and will work another year. Many thanks for all the effort put into Jupyter and friends! |
I've just run into the pam_mount problem with JupyterHub. Do I get it right that the best workaround to make user-individual authenticated DAVFS or CIFS mounts work with JupyterHub is the pampylho authenticator mentioned by @dsoares in this comment? jupyterhub/jupyterhub#810 (comment) |
Not using pam_mount for my JupyterHubs anymore. I don't think that jupyterhub/jupyterhub#810 (comment) really works in a stable way due to a bug in pam_mount, which does not allow for multiple parallel PAM sessions within one process (JHub). Have a look at jupyter-fs, a JLab extension for connecting to DAV and many other local or remote file systems. Have this running on my hubs and users configure their remote file systems individually (mostly Nextclouds). |
Hi,
First of all thank you for your work on jupyterhub ! 🙏
I'm working in an environement where I think
pamela
prevents me from mounting a volume usingpam_mount
.This is similara/related to jupyterhub/jupyterhub#810
Here are the details:
pam_mkhomedir
.The flow works properly with
ssh
but when connecting throughjupyterhub
the mounting part of the flow fails with a permission denied orpam_mount
asks me to reenter the password (depending on whether thedisable_interactive
flag is passed topam_mount
).After some digging, my understanding is that the
authenticate
method ofpamela
does not start a pam session and expects the pam sessions to be started later on after authentication succeeds.If that is the case then the credentials to
pam_mount
are lost befire thepam_session
is created and therefore thepam_mount
cannot proceed accordingly.The current dirty fix that I use is opening a pam session directly at the end of authenticate:
I've tried to pass
close=False
to the function but it did not work.I'm not sure that's the proper way of doing things. What do you think ? Could that change be activated using a flag just like
resetcred
?The text was updated successfully, but these errors were encountered: