I've encountered a problem similar (or identical to) 967, but in a modern version, and with more details. I found a problem where executing ssh with a command (`ssh host date` would often hang. Debugging sshd I found that do_exec_no_pty() calls do_child() after the fork(), and that then execve()s. That execve doesn't appear to complete, and the parent doesn't appear to return from the fork. Observe: Dec 2 03:23:14 yoda sshd[7463]: debug1: calling fork in do_exec_no_pty Dec 2 03:23:14 yoda sshd[7464]: debug1: permanently_set_uid: 982/100 Dec 2 03:23:14 yoda sshd[7464]: debug3: channel 0: close_fds r -1 w -1 e -1 c -1 Dec 2 03:23:14 yoda sshd[7464]: debug1: calling execve for user command [then nothing] If I add a usleep(10) before the execve, it works fine: Dec 2 03:31:16 yoda sshd[8275]: debug1: calling fork in do_exec_no_pty Dec 2 03:31:16 yoda sshd[8276]: debug1: permanently_set_uid: 982/100 Dec 2 03:31:16 yoda sshd[8276]: debug3: channel 0: close_fds r -1 w -1 e -1 c -1 Dec 2 03:31:16 yoda sshd[8275]: debug1: parent Dec 2 03:31:16 yoda sshd[8275]: debug1: calling session_set_fds Dec 2 03:31:16 yoda sshd[8275]: debug1: channel_set_fds Dec 2 03:31:16 yoda sshd[8275]: debug1: channel_register_fds Dec 2 03:31:16 yoda sshd[8275]: debug2: fd 7 setting O_NONBLOCK Dec 2 03:31:16 yoda sshd[8275]: debug3: fd 7 is O_NONBLOCK Dec 2 03:31:16 yoda sshd[8275]: debug2: fd 9 setting O_NONBLOCK Dec 2 03:31:16 yoda sshd[8275]: debug1: calling session_set_fds done Dec 2 03:31:16 yoda sshd[8276]: debug1: calling execve for user command [and then onwards] In the hung state, the "date" process is in a zombie state, and its parent sshd is sleeping. I will attach a stack trace. I don't believe I use threads, my config.log shows: $ ./configure --prefix=/usr --host=i686-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --with-ldflags= --disable-strip --sysconfdir=/etc/ssh --libexecdir=/usr/lib/misc --datadir=/usr/share/openssh --disable-suid-ssh --with-privsep-path=/var/empty --with-privsep-user=sshd --with-md5-pas swords --without-libedit --without-kerberos5 --with-tcp-wrappers --without-skey --without-opensc --with-ldap --without-pam --build=i686-pc-linux-gnu I also found a glibc bug report that sounds somewhat similar http://sourceware.org/bugzilla/show_bug.cgi?id=838 If I comment out the debug() call in sigchld_handler, the problem goes away. This is on gentoo ~x86, netmisc/openssh-4.2p1, gcc-3.4.4, glibc-2.3.5-r3, 2.6.14-gentoo-r3.
Created attachment 1034 [details] stack trace of hanging sshd
just to be clear, this happens at DEBUG logging level. Also, some of the messages in the output are produced by temporary debug statements I inserted. I'll attach the diff that produces them
Created attachment 1035 [details] extra debugging
Created attachment 1065 [details] Move debug from signal handler I think we should just remove the debug from it signal handler. It is not safe on platforms that don't implement syslog_r properly. Does this patch solve the problem for you?
I repeated problem on my current OpenSSH_4.3p1, DEBUG3, on gcc-3.4.5, glibc-2.3.6-r2. I then applied the patch, and couldn't repro enymore. Thanks!
With the release of 4.4, we believe that this bug is now closed. For information about the release please see http://www.openssh.com/txt/release-4.4 .