Bug 1105 - Privilege Separation
Summary: Privilege Separation
Status: CLOSED INVALID
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 4.2p1
Hardware: All All
: P2 normal
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-10-17 05:53 AEST by Jim Gifford
Modified: 2006-10-07 11:42 AEST (History)
0 users

See Also:


Attachments
Requested debug output (7.96 KB, text/plain)
2005-10-17 13:54 AEST, Jim Gifford
no flags Details
Fix privsep + root login + delayed compression bug. (622 bytes, patch)
2005-10-17 14:10 AEST, Darren Tucker
no flags Details | Diff
Updated debug output (8.11 KB, text/plain)
2005-10-17 16:12 AEST, Jim Gifford
no flags Details
SSH Strace (67.00 KB, text/plain)
2005-10-18 01:06 AEST, Jim Gifford
no flags Details
Updated strace log (66.53 KB, text/plain)
2005-10-18 10:51 AEST, Jim Gifford
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jim Gifford 2005-10-17 05:53:47 AEST
I've been doing a lot of build of the portable openssh, with a modern toolchain (gcc 4.0.2, glibc 20050926 snapshot, and binutils 2.16.1). No matter on what architecture I use I have been unable to utilize privledge separation. Here is what happens.

Connect - Enter username - password - then it exits.

If I go into sshd_config - and set UsePrivilegeSeparation no, everything works perfectly.

Any suggestions or recommendations. I few people believe the issue related to a glibc bug in the chroot, which has been fixed in the glibc I'm using, but I think the problem is in openssh.
Comment 1 Darren Tucker 2005-10-17 10:19:35 AEST
What OS are you using?  I'm guessing a Linux since you're using glibc but you don't specify.  What options did you build and run OpenSSH with?

Are you using keyboard-interactive authentication and if so does the problem occur without it?

Could you please attach (as an attachment, not in the comment field) the debug output from the server?  eg "/path/to/sshd -ddde -p 2022" then point your client at port 2022.

From what you've described, it does sound like the glibc thing.  Does the test for the glibc bug pass or crash?
http://marc.theaimsgroup.com/?l=openssh-unix-dev&m=111061843820265
Comment 2 Jim Gifford 2005-10-17 13:53:44 AEST
Yes it's linux.

Yes I saw that issue, and it doesn't affect my setup.
I also checked http://sources.redhat.com/ml/libc-hacker/2005-02/msg00005.html

Will be attaching the output you requested.
Comment 3 Jim Gifford 2005-10-17 13:54:43 AEST
Created attachment 999 [details]
Requested debug output
Comment 4 Darren Tucker 2005-10-17 14:10:07 AEST
Created attachment 1000 [details]
Fix privsep + root login + delayed compression bug.

OK, looking at the debug output, I think that is fixed with the following change (patch attached):
   - djm@cvs.openbsd.org 2005/09/19 11:47:09
     [sshd.c]
     stop connection abort on rekey with delayed compression enabled when
     post-auth privsep is disabled (e.g. when root is logged in); ok dtucker@

If so, this is already fixed in -HEAD and the 4.2 branch.  You can also work around it by setting "Compression yes" in sshd_config.
Comment 5 Jim Gifford 2005-10-17 16:12:49 AEST
Created attachment 1001 [details]
Updated debug output
Comment 6 Jim Gifford 2005-10-17 16:13:33 AEST
Still having the same issue. Updated the debug info.
Comment 7 Damien Miller 2005-10-17 19:16:25 AEST
You mention trying different "architectures", what do you mean?

What OS/Distribution are you using? (beyond "Linux"...)

This doesn't look like the rekey bug - it looks like the child session is terminating normally from the perspective of sshd. So it is probably blowing up inside session.c:do_child()

Given the bleeding-edge nature of your system, it isn't likely that we are going to be able to replicate your configuration easily and it is probable that your problem lies in glibc or gcc. gcc-4.x has been known miscompile OpenSSH (e.g. Bug #1080), so you might want to try a 3.x version if you can. 

Apart from this, your best bet would be attaching gdb or  instrumenting session.c:do_child() with fprintf(stderr, "%d", __LINE__); calls to see how far it gets.
Comment 8 Darren Tucker 2005-10-17 22:54:39 AEST
Is it possible that your shell is simply exiting for some reason?  Could you try another shell (eg sash since that would eliminate shared library problems too).

Also worth a try: run sshd under "strace -f" (but be aware that that the output of strace may contain passwords.)
Comment 9 Jim Gifford 2005-10-18 01:06:26 AEST
Created attachment 1004 [details]
SSH Strace
Comment 10 Jim Gifford 2005-10-18 01:07:11 AEST
Attached strace of the issue. Password removed
Comment 11 Darren Tucker 2005-10-18 09:46:22 AEST
All of the interesting things happened in one of the child processes and you didn't use the strace "-f" option to follow it after a fork.

You also haven't answered questions about which options you built and are running OpenSSH with, the exact nature of your system (it sounds like a self-built one?), what you meant by trying this on multiple "architectures", or whether or not the problem occurs with an alternate shell.
Comment 12 Jim Gifford 2005-10-18 10:50:16 AEST
Self built system - same build method I've used for years.

./configure --prefix=/usr --sysconfdir=/etc/ssh \
    --libexecdir=/usr/sbin --with-md5-passwords \
    --with-privsep-path=/var/lib/sshd
make
make install

Will be updating strace in a few minutes.
Comment 13 Jim Gifford 2005-10-18 10:51:08 AEST
Created attachment 1005 [details]
Updated strace log
Comment 14 Jim Gifford 2005-10-18 10:52:33 AEST
Will test with a different shell. Will report back.
Comment 15 Jim Gifford 2005-10-27 05:01:43 AEST
Same issue, tested with ash, zsh, and tcsh
Comment 16 Darren Tucker 2005-10-27 18:42:34 AEST
Comment on attachment 1005 [details]
Updated strace log

>Process 11732 attached
>child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xfffff80000c92d70) = 11732
[...]

>[pid 11725] rt_sigtimedwait([?], ptrace: umoven: Input/output error
>0x3, 0, 6) = 0

Not sure if this is related or not.

[...]
>[pid 11725] waitpid(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV}], WNOHANG) = 11732

Note that the child pid (11732) does not show up in the strace at all.  It seems like the fork() fails for some reason (process limits?).  Since you're running a custom system we can't reproduce the problem, and since no one else has reported anything similar then I'm afraid you're on your own.
Comment 17 Jim Gifford 2005-11-12 15:29:09 AEDT
Updated: The last glibc snapshot(1107) seems to have corrected the issue. It now works on the 3 different architectures I had problems with. Don't know which patch to glibc fixed it, but It works now.
Comment 18 Darren Tucker 2005-11-12 16:06:56 AEDT
OK, thanks.  Closing.
Comment 19 Darren Tucker 2006-10-07 11:42:50 AEST
Change all RESOLVED bug to CLOSED with the exception of the ones fixed post-4.4.