Bug 1189 - PAM module hangs root logout
Summary: PAM module hangs root logout
Status: CLOSED DUPLICATE of bug 926
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: PAM support (show other bugs)
Version: 4.3p2
Hardware: UltraSPARC Solaris
: P2 normal
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-05-16 06:42 AEST by William Knox
Modified: 2006-10-07 11:45 AEST (History)
0 users

See Also:


Attachments
Build options (1.97 KB, text/plain)
2006-05-16 06:44 AEST, William Knox
no flags Details
Stack backtrace (348 bytes, application/octet-stream)
2006-05-16 06:45 AEST, William Knox
no flags Details
/etc/pam.conf file (2.71 KB, text/plain)
2006-05-16 06:48 AEST, William Knox
no flags Details
Debug output from server (32.35 KB, text/plain)
2006-05-16 06:49 AEST, William Knox
no flags Details
Debug output from client (7.88 KB, text/plain)
2006-05-16 06:49 AEST, William Knox
no flags Details
Truss output from sshd (truss -vpoll -f -d) (127.74 KB, text/plain)
2006-05-20 03:08 AEST, William Knox
no flags Details
lsof of child sshd process (2.96 KB, text/plain)
2006-05-22 12:55 AEST, William Knox
no flags Details
pfiles of child sshd process (1014 bytes, text/plain)
2006-05-22 12:56 AEST, William Knox
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description William Knox 2006-05-16 06:42:48 AEST
When connecting to a server as root with a key-pair if stacked PAM modules are being used, the connection hangs upon disconnect. This only affects the root user and only when connection is made with the key-pair. I have (or will have) attached the /etc/pam.conf in question, the debug output from both the client and the server with the hang point indicated, the build output and a stack backtrace. The server in question is a fairly recently patched Solaris 8 (117350-28), and I would be happy to answer any questions about anything else. The PAM module in question, by the way, is from RSA to provide SecurID access.
Comment 1 William Knox 2006-05-16 06:44:24 AEST
Created attachment 1133 [details]
Build options
Comment 2 William Knox 2006-05-16 06:45:40 AEST
Created attachment 1134 [details]
Stack backtrace
Comment 3 William Knox 2006-05-16 06:48:55 AEST
Created attachment 1135 [details]
/etc/pam.conf file
Comment 4 William Knox 2006-05-16 06:49:26 AEST
Created attachment 1136 [details]
Debug output from server
Comment 5 William Knox 2006-05-16 06:49:48 AEST
Created attachment 1137 [details]
Debug output from client
Comment 6 William Knox 2006-05-20 03:06:34 AEST
Additional testing reveals that

1) the hang is caused by having the PAM module in question alone performing authentication - it doesn't have to be stacked
2) non-root users will also hang using pubkey auth if sshd is configured without PrivSep
3) not all PAM modules exhibit this behavior

I suppose this bug boils down to one of, if pubkey auth succeeded, why would the auth PAM modules be getting touched at all? Even if I have a clunky PAM module, I would have thought it wouldn't matter if it is not being called for auth.

I am about to attach the output of truss -vpoll -f -d on the sshd command in question. The hang occurs between the timestamps 15.69 and 26.18 (which is where I hit Ctrl-C).

Thanks in advance for any help or pointers to a clue, if I am overlooking something (aside from getting rid of the PAM module in question).
Comment 7 William Knox 2006-05-20 03:08:08 AEST
Created attachment 1138 [details]
Truss output from sshd (truss -vpoll -f -d)
Comment 8 Darren Tucker 2006-05-20 07:42:59 AEST
(In reply to comment #6)
> Additional testing reveals that
> 
> 1) the hang is caused by having the PAM module in question alone
> performing authentication - it doesn't have to be stacked
> 2) non-root users will also hang using pubkey auth if sshd is
> configured without PrivSep
> 3) not all PAM modules exhibit this behavior
> 
> I suppose this bug boils down to one of, if pubkey auth succeeded, why
> would the auth PAM modules be getting touched at all? Even if I have a
> clunky PAM module, I would have thought it wouldn't matter if it is not
> being called for auth.

pam_setcred() uses the auth stack too and that's called regardless of the ssh authentication method.

> I am about to attach the output of truss -vpoll -f -d on the sshd
> command in question. The hang occurs between the timestamps 15.69 and
> 26.18 (which is where I hit Ctrl-C).
> 
> Thanks in advance for any help or pointers to a clue, if I am
> overlooking something (aside from getting rid of the PAM module in
> question).

Try lsof'ing (or equivalent) the hanging sshd (and/or its shell subprocess if it still has one).  I suspect that your recalcitrant module is leaking file descriptors and sshd is waiting for the leaked desriptor to close.

Excellent bug report, btw :-)
Comment 9 William Knox 2006-05-22 12:54:34 AEST
I'm attaching the lsof and pfiles output of the child sshd process (the shell process is still there, but labelled a defunct process with no open files) - I am not familiar enough with the mechanics of sshd at this point to spot a leaked FD awaiting closure, but ain't nothing leaping out to me. I'll also open a case with RSA about their module to see if they can shed any light.

Thanks for the help.
Comment 10 William Knox 2006-05-22 12:55:41 AEST
Created attachment 1140 [details]
lsof of child sshd process
Comment 11 William Knox 2006-05-22 12:56:10 AEST
Created attachment 1141 [details]
pfiles of child sshd process
Comment 12 William Knox 2006-05-22 12:56:57 AEST
Updated summary for accuracy
Comment 13 Darren Tucker 2006-05-22 13:35:58 AEST
Descriptor 8 in the lsof output seems a likely suspect.  I went back to the truss, and one thing jumped out at me: the child process closes descriptor 8 then exits.

This makes me think that the cause is what is described in bug #926.  There's a patch in that bug which is not right, but I think will solve your problem enough to prove whether or not this guess is correct, could you please try it?  Thanks.
Comment 14 William Knox 2006-05-22 23:04:04 AEST
It DOES help in the privsep case. As a side note, it doesn't help when privsep is turned off (though this appears to be noted in the 926 bug report). If I am reading this correctly, then, this patch is "doing the right thing" as long as you keep privsep enabled? I would be happy to perform any testing that people like for this patch or any others that come down the pike in order to confirm that.

Thanks again for the help. I guess this bug can be labelled a duplicate of 926.
Comment 15 Darren Tucker 2006-05-23 07:07:54 AEST
(In reply to comment #14)
> It DOES help in the privsep case. As a side note, it doesn't help when
> privsep is turned off (though this appears to be noted in the 926 bug
> report). If I am reading this correctly, then, this patch is "doing the
> right thing" as long as you keep privsep enabled?

Yeah that's basically it.  Doing the same thing for privsep=no would also mean breaking it for other situations where it currently works (or maybe adding another process per connection, which I'm not wild about).

Patch #1143 doesn't change the behaviour for privsep=no, and is almost certainly an improvement on what we have now for privsep=yes, so I would like to see it or something similar in the next release.

> I would be happy to
> perform any testing that people like for this patch or any others that
> come down the pike in order to confirm that.

Based on the timing, I'm guessing you tested patch #1143?  I would be interested to know if it also solves your problem for privsep=yes and user=root, assuming you permit this.

> Thanks again for the help. I guess this bug can be labelled a duplicate
> of 926.

Thanks, marking as duplicate of 926.


*** This bug has been marked as a duplicate of bug 926 ***
Comment 16 William Knox 2006-05-23 07:15:57 AEST
Yes, I tested patch 1143 (sorry I wasn't specific - I didn't see that that patch had been posted just this morning). The only case with trouble when privsep was on was root via pubkey - non-root users only had trouble when privsep was off - so this solved my issue.

Again, I'd be happy to test any future patches against this known test case. Thanks for the help.
Comment 17 Darren Tucker 2006-05-23 07:24:31 AEST
(In reply to comment #16)
> Yes, I tested patch 1143 (sorry I wasn't specific - I didn't see that
> that patch had been posted just this morning). The only case with
> trouble when privsep was on was root via pubkey - non-root users only
> had trouble when privsep was off - so this solved my issue.

That's what I suspected.  When privsep=yes and you're logging in as root then after successful authentication, post-auth privsep is disabled (since there's no point).

I'll think about this some more.
Comment 18 Darren Tucker 2006-10-07 11:45:13 AEST
Change all RESOLVED bug to CLOSED with the exception of the ones fixed post-4.4.