| Summary: | Privilege Separation + PAM locks users out | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Portable OpenSSH | Reporter: | William M. Grim <wgrim> | ||||||
| Component: | PAM support | Assignee: | OpenSSH Bugzilla mailing list <openssh-bugs> | ||||||
| Status: | CLOSED FIXED | ||||||||
| Severity: | critical | ||||||||
| Priority: | P1 | ||||||||
| Version: | 3.8.1p1 | ||||||||
| Hardware: | All | ||||||||
| OS: | All | ||||||||
| Bug Depends on: | |||||||||
| Bug Blocks: | 822 | ||||||||
| Attachments: |
|
||||||||
|
Description
William M. Grim
2004-04-09 15:18:42 AEST
Created attachment 600 [details]
Reset thread status
Please try this patch (which has already been committed to -current, auth-pam.c
rev 1.97) or try a snapshot.
BTW the only binary that should be setuid is ssh-keysign (and possibly ssh, but only if you use a server that requires connections from low-numbered ports, eg for RSARhosts authentication). The patch on this bug is in 3.8.1p1, so I think this is fixed. Does the problem still occur with that version? Created attachment 639 [details] Signal PAM "thread" if SIGCHLD is caused by the privsep slave exitting Colin Watson pointed out that this may correspond to a Debian bug: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=248125 It appears that what is happening is that the client exits, breaking the TCP connection. When that happens, the privsep slave exits too, which causes a SIGCHLD to be delivered to the monitor. The monitor then attempts to waitpid() on the PAM "thread" which is still alive and blissfully unaware of a problem (because nobody told it to die). That waitpid hangs the monitor's cleanup. The attached patch tests adds a test for this case to the signal handler to shoot the PAM thread itself if it has to. It the same as the one I sent to the Debian bug except it resets SIGCHLD to prevent reentering the signal handler when the second process exits. Comment on attachment 639 [details]
Signal PAM "thread" if SIGCHLD is caused by the privsep slave exitting
Looks sane to me.
Thanks, patch id #639 has just been committed (to both HEAD and 3.8.1 branch). William, could you please try either the patch or a snapshot[1] and confirm whether or not the problem is fixed for you? [1] ftp://ftp.openbsd.org/pub/OpenBSD/OpenSSH/portable/snapshot/ or one of its mirrors. Mario Holbe reports that the patch has been applied to Debian (unstable) and fixes the problem for him. I think this is now fixed, so I'm resolving this bug. If you can reproduce your problem with either a current snapshot or 3.8.1p1 with patch id #639 then please reopen this bug. There is a bug in the patch: waitpid() with ENOHANG can return 0 if the child is
still alive. The corresponding piece of code in sshpam_sigchld_handler() should
look like this one:
+ int res;
...
+ res = waitpid(cleanup_ctxt->pam_thread, &sshpam_thread_status, WNOHANG);
+ if (res == 0 || res == -1) {
+ /* PAM thread has not exitted, privsep slave must have */
+ kill(cleanup_ctxt->pam_thread, SIGTERM);
+ res = waitpid(cleanup_ctxt->pam_thread, &sshpam_thread_status, 0);
+ if (res == -1)
+ return; /* could not wait */
+ }
This has already been fixed in -current: 20040711 - (dtucker) [auth-pam.c] Check for zero from waitpid() too, which allows the monitor to properly clean up the PAM thread (Debian bug #252676). |