I've been doing a lot of build of the portable openssh, with a modern toolchain (gcc 4.0.2, glibc 20050926 snapshot, and binutils 2.16.1). No matter on what architecture I use I have been unable to utilize privledge separation. Here is what happens. Connect - Enter username - password - then it exits. If I go into sshd_config - and set UsePrivilegeSeparation no, everything works perfectly. Any suggestions or recommendations. I few people believe the issue related to a glibc bug in the chroot, which has been fixed in the glibc I'm using, but I think the problem is in openssh.
What OS are you using? I'm guessing a Linux since you're using glibc but you don't specify. What options did you build and run OpenSSH with? Are you using keyboard-interactive authentication and if so does the problem occur without it? Could you please attach (as an attachment, not in the comment field) the debug output from the server? eg "/path/to/sshd -ddde -p 2022" then point your client at port 2022. From what you've described, it does sound like the glibc thing. Does the test for the glibc bug pass or crash? http://marc.theaimsgroup.com/?l=openssh-unix-dev&m=111061843820265
Yes it's linux. Yes I saw that issue, and it doesn't affect my setup. I also checked http://sources.redhat.com/ml/libc-hacker/2005-02/msg00005.html Will be attaching the output you requested.
Created attachment 999 [details] Requested debug output
Created attachment 1000 [details] Fix privsep + root login + delayed compression bug. OK, looking at the debug output, I think that is fixed with the following change (patch attached): - djm@cvs.openbsd.org 2005/09/19 11:47:09 [sshd.c] stop connection abort on rekey with delayed compression enabled when post-auth privsep is disabled (e.g. when root is logged in); ok dtucker@ If so, this is already fixed in -HEAD and the 4.2 branch. You can also work around it by setting "Compression yes" in sshd_config.
Created attachment 1001 [details] Updated debug output
Still having the same issue. Updated the debug info.
You mention trying different "architectures", what do you mean? What OS/Distribution are you using? (beyond "Linux"...) This doesn't look like the rekey bug - it looks like the child session is terminating normally from the perspective of sshd. So it is probably blowing up inside session.c:do_child() Given the bleeding-edge nature of your system, it isn't likely that we are going to be able to replicate your configuration easily and it is probable that your problem lies in glibc or gcc. gcc-4.x has been known miscompile OpenSSH (e.g. Bug #1080), so you might want to try a 3.x version if you can. Apart from this, your best bet would be attaching gdb or instrumenting session.c:do_child() with fprintf(stderr, "%d", __LINE__); calls to see how far it gets.
Is it possible that your shell is simply exiting for some reason? Could you try another shell (eg sash since that would eliminate shared library problems too). Also worth a try: run sshd under "strace -f" (but be aware that that the output of strace may contain passwords.)
Created attachment 1004 [details] SSH Strace
Attached strace of the issue. Password removed
All of the interesting things happened in one of the child processes and you didn't use the strace "-f" option to follow it after a fork. You also haven't answered questions about which options you built and are running OpenSSH with, the exact nature of your system (it sounds like a self-built one?), what you meant by trying this on multiple "architectures", or whether or not the problem occurs with an alternate shell.
Self built system - same build method I've used for years. ./configure --prefix=/usr --sysconfdir=/etc/ssh \ --libexecdir=/usr/sbin --with-md5-passwords \ --with-privsep-path=/var/lib/sshd make make install Will be updating strace in a few minutes.
Created attachment 1005 [details] Updated strace log
Will test with a different shell. Will report back.
Same issue, tested with ash, zsh, and tcsh
Comment on attachment 1005 [details] Updated strace log >Process 11732 attached >child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xfffff80000c92d70) = 11732 [...] >[pid 11725] rt_sigtimedwait([?], ptrace: umoven: Input/output error >0x3, 0, 6) = 0 Not sure if this is related or not. [...] >[pid 11725] waitpid(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV}], WNOHANG) = 11732 Note that the child pid (11732) does not show up in the strace at all. It seems like the fork() fails for some reason (process limits?). Since you're running a custom system we can't reproduce the problem, and since no one else has reported anything similar then I'm afraid you're on your own.
Updated: The last glibc snapshot(1107) seems to have corrected the issue. It now works on the 3 different architectures I had problems with. Don't know which patch to glibc fixed it, but It works now.
OK, thanks. Closing.
Change all RESOLVED bug to CLOSED with the exception of the ones fixed post-4.4.