Bug 1129 - sshd hangs for command-only invocations due to fork/child signals
Summary: sshd hangs for command-only invocations due to fork/child signals
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 4.2p1
Hardware: ix86 Linux
: P2 normal
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks: V_4_4
  Show dependency treegraph
 
Reported: 2005-12-02 16:33 AEDT by Martijn Koster
Modified: 2006-09-28 19:25 AEST (History)
0 users

See Also:


Attachments
stack trace of hanging sshd (1.13 KB, text/plain)
2005-12-02 16:34 AEDT, Martijn Koster
no flags Details
extra debugging (1.24 KB, patch)
2005-12-02 23:16 AEDT, Martijn Koster
no flags Details | Diff
Move debug from signal handler (759 bytes, patch)
2006-02-12 12:41 AEDT, Damien Miller
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martijn Koster 2005-12-02 16:33:53 AEDT
I've encountered a problem similar (or identical to) 967,
but in a modern version, and with more details.


I found a problem where executing ssh with a command (`ssh host date`
would often hang. Debugging sshd I found that do_exec_no_pty() calls
do_child() after the fork(), and that then execve()s. That
execve doesn't appear to complete, and the parent doesn't
appear to return from the fork. Observe:

Dec  2 03:23:14 yoda sshd[7463]: debug1: calling fork in do_exec_no_pty
Dec  2 03:23:14 yoda sshd[7464]: debug1: permanently_set_uid: 982/100
Dec  2 03:23:14 yoda sshd[7464]: debug3: channel 0: close_fds r -1 w -1 e -1 c -1
Dec  2 03:23:14 yoda sshd[7464]: debug1: calling execve for user command
[then nothing]

If I add a usleep(10) before the execve, it works fine:

Dec  2 03:31:16 yoda sshd[8275]: debug1: calling fork in do_exec_no_pty
Dec  2 03:31:16 yoda sshd[8276]: debug1: permanently_set_uid: 982/100
Dec  2 03:31:16 yoda sshd[8276]: debug3: channel 0: close_fds r -1 w -1 e -1 c -1
Dec  2 03:31:16 yoda sshd[8275]: debug1: parent
Dec  2 03:31:16 yoda sshd[8275]: debug1: calling session_set_fds
Dec  2 03:31:16 yoda sshd[8275]: debug1: channel_set_fds
Dec  2 03:31:16 yoda sshd[8275]: debug1: channel_register_fds
Dec  2 03:31:16 yoda sshd[8275]: debug2: fd 7 setting O_NONBLOCK
Dec  2 03:31:16 yoda sshd[8275]: debug3: fd 7 is O_NONBLOCK
Dec  2 03:31:16 yoda sshd[8275]: debug2: fd 9 setting O_NONBLOCK
Dec  2 03:31:16 yoda sshd[8275]: debug1: calling session_set_fds done
Dec  2 03:31:16 yoda sshd[8276]: debug1: calling execve for user command
[and then onwards]

In the hung state, the "date" process is in a zombie state, and its
parent sshd is sleeping. I will attach a stack trace.

I don't believe I use threads, my config.log shows:

  $ ./configure --prefix=/usr --host=i686-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --with-ldflags=
 --disable-strip --sysconfdir=/etc/ssh --libexecdir=/usr/lib/misc --datadir=/usr/share/openssh --disable-suid-ssh --with-privsep-path=/var/empty --with-privsep-user=sshd --with-md5-pas
swords --without-libedit --without-kerberos5 --with-tcp-wrappers --without-skey --without-opensc --with-ldap --without-pam --build=i686-pc-linux-gnu

I also found a glibc bug report that sounds somewhat similar
http://sourceware.org/bugzilla/show_bug.cgi?id=838

If I comment out the debug() call in sigchld_handler,
the problem goes away.

This is on gentoo ~x86, netmisc/openssh-4.2p1, gcc-3.4.4, glibc-2.3.5-r3, 2.6.14-gentoo-r3.
Comment 1 Martijn Koster 2005-12-02 16:34:52 AEDT
Created attachment 1034 [details]
stack trace of hanging sshd
Comment 2 Martijn Koster 2005-12-02 23:14:52 AEDT
just to be clear, this happens at DEBUG logging level.

Also, some of the messages in the output are produced by temporary
debug statements I inserted. I'll attach the diff that produces them
Comment 3 Martijn Koster 2005-12-02 23:16:44 AEDT
Created attachment 1035 [details]
extra debugging
Comment 4 Damien Miller 2006-02-12 12:41:47 AEDT
Created attachment 1065 [details]
Move debug from signal handler

I think we should just remove the debug from it signal handler. It is not safe on platforms that don't implement syslog_r properly.

Does this patch solve the problem for you?
Comment 5 Martijn Koster 2006-02-13 22:13:34 AEDT
I repeated problem on my current OpenSSH_4.3p1, DEBUG3, on gcc-3.4.5, glibc-2.3.6-r2. I then applied the patch, and couldn't repro enymore.
Thanks!
Comment 6 Darren Tucker 2006-09-28 19:25:39 AEST
With the release of 4.4, we believe that this bug is now closed.  For information about the release please see http://www.openssh.com/txt/release-4.4 .