Bug 839 - Privilege Separation + PAM locks users out
Summary: Privilege Separation + PAM locks users out
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: PAM support (show other bugs)
Version: 3.8.1p1
Hardware: All All
: P1 critical
Assignee: OpenSSH Bugzilla mailing list
URL:
Keywords:
Depends on:
Blocks: 822
  Show dependency treegraph
 
Reported: 2004-04-09 15:18 AEST by William M. Grim
Modified: 2004-09-11 13:18 AEST (History)
0 users

See Also:


Attachments
Reset thread status (305 bytes, patch)
2004-04-09 15:27 AEST, Darren Tucker
no flags Details | Diff
Signal PAM "thread" if SIGCHLD is caused by the privsep slave exitting (1.07 KB, patch)
2004-05-21 13:08 AEST, Darren Tucker
djm: ok+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description William M. Grim 2004-04-09 15:18:42 AEST
I was having a problem all weekend where UsePrivilegeSeparation was on, and
users were being authenticated through PAM modules.

I would continuously get ssh_exchange_identification errors.  Generally this is
a hosts.allow/.deny problem.  However, after running into this problem 3 times,
I determined this was not the problem.

The problem has to do with something between sshd and PAM during privilege
separation.  I was randomly getting several "sshd: <user> [pam]" processes in my
"ps ax" list.  When the maximum unauthenticated connetion limit was reached, no
one could login.

Turning privilege separation off seems to remove the problem.  It is also
important to make sure ssh* binaries are not setuid root in this case.  Use
SELinux or similar if you feel you need more security.

However, I would like privilege separation fixed.
Comment 1 Darren Tucker 2004-04-09 15:27:19 AEST
Created attachment 600 [details]
Reset thread status

Please try this patch (which has already been committed to -current, auth-pam.c
rev 1.97) or try a snapshot.
Comment 2 Darren Tucker 2004-04-09 15:31:48 AEST
BTW the only binary that should be setuid is ssh-keysign (and possibly ssh, but
only if you use a server that requires connections from low-numbered ports, eg
for RSARhosts authentication).
Comment 3 Darren Tucker 2004-05-03 11:05:25 AEST
The patch on this bug is in 3.8.1p1, so I think this is fixed.  Does the problem
still occur with that version?
Comment 4 Darren Tucker 2004-05-21 13:08:39 AEST
Created attachment 639 [details]
Signal PAM "thread" if SIGCHLD is caused by the privsep slave exitting

Colin Watson pointed out that this may correspond to a Debian bug:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=248125

It appears that what is happening is that the client exits, breaking the TCP
connection.  When that happens, the privsep slave exits too, which causes a
SIGCHLD to be delivered to the monitor.  The monitor then attempts to waitpid()
on the PAM "thread" which is still alive and blissfully unaware of a problem
(because nobody told it to die).  That waitpid hangs the monitor's cleanup.

The attached patch tests adds a test for this case to the signal handler to
shoot the PAM  thread itself if it has to.  It the same as the one I sent to
the Debian bug except it resets SIGCHLD to prevent reentering the signal
handler when the second process exits.
Comment 5 Damien Miller 2004-05-21 13:26:15 AEST
Comment on attachment 639 [details]
Signal PAM "thread" if SIGCHLD is caused by the privsep slave exitting

Looks sane to me.
Comment 6 Darren Tucker 2004-05-24 12:00:13 AEST
Thanks, patch id #639 has just been committed (to both HEAD and 3.8.1 branch).

William, could you please try either the patch or a snapshot[1] and confirm
whether or not the problem is fixed for you?

[1] ftp://ftp.openbsd.org/pub/OpenBSD/OpenSSH/portable/snapshot/ or one of its
mirrors.
Comment 7 Darren Tucker 2004-05-30 21:06:18 AEST
Mario Holbe reports that the patch has been applied to Debian (unstable) and
fixes the problem for him.

I think this is now fixed, so I'm resolving this bug.  If you can reproduce your
problem with either a current snapshot or 3.8.1p1 with patch id #639 then please
reopen this bug.
Comment 8 Pavel Kankovsky 2004-07-21 09:20:56 AEST
There is a bug in the patch: waitpid() with ENOHANG can return 0 if the child is
still alive. The corresponding piece of code in sshpam_sigchld_handler() should
look like this one:

+       int res;
...
+       res = waitpid(cleanup_ctxt->pam_thread, &sshpam_thread_status, WNOHANG);
+       if (res == 0 || res == -1) {
+               /* PAM thread has not exitted, privsep slave must have */
+               kill(cleanup_ctxt->pam_thread, SIGTERM);
+               res = waitpid(cleanup_ctxt->pam_thread, &sshpam_thread_status, 0);
+               if (res == -1)
+                       return; /* could not wait */
+       }
Comment 9 Darren Tucker 2004-07-21 16:49:15 AEST
This has already been fixed in -current:

20040711
 - (dtucker) [auth-pam.c] Check for zero from waitpid() too, which allows
   the monitor to properly clean up the PAM thread (Debian bug #252676).