Bugzilla – Attachment 2220 Details for
Bug 2071
sshd closes stderr but not stdout when child process exits
Home
|
New
|
Browse
|
Search
|
[?]
|
Reports
|
Requests
|
Help
|
New Account
|
Log In
[x]
|
Forgot Password
Login:
[x]
strace output and code fragments demonstrating stderr-vs-stdout bug
file_2071.txt (text/plain), 9.47 KB, created by
JD Paul
on 2013-02-16 02:50:32 AEDT
(
hide
)
Description:
strace output and code fragments demonstrating stderr-vs-stdout bug
Filename:
MIME Type:
Creator:
JD Paul
Created:
2013-02-16 02:50:32 AEDT
Size:
9.47 KB
patch
obsolete
>Based on strace data, it appears that when sshd gets the SIGCHLD from >the primary shell exiting, it read()s from the pipe connected to the >child's stderr -- and read() returns -1 with an errno of EAGAIN. sshd >then close()s that fd. The next time the subshell (or any further >child) tries to write to stderr, it fails and gets a SIGPIPE signal, >which ends up killing the entire process group. > >Here is an annotated Linux strace of sshd when doing the command shown >above (the one involving both stdout and stderr). PIDs have been >renumbered (starting at 10001) to make it easier to compare. > >10001 - sshd daemon > 10002 - sshd parent > 10004 - sshd child: fd 10 = stdout from shell; fd 12 = stderr > 10005 - primary shell > 10020 - subshell > > ># sshd child is waiting on output from shell >10004 time(NULL) = 1359665510 >10004 select(13, [3 5 7 10 12], [], NULL, NULL <unfinished ...> ># primary shell exits >10005 <... write resumed> ) = 11 >10005 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 >10005 exit_group(0) = ? ># sshd child does some work and then tries to read from primary shell's stderr >10004 <... select resumed> ) = ? ERESTARTNOHAND (To be restarted) >10004 --- SIGCHLD (Child exited) @ 0 (0) --- >10004 rt_sigaction(SIGCHLD, NULL, {0x8053e28, [], SA_RESTORER|SA_INTERRUPT|SA_RESETHAND|SA_SIGINFO|SA_NOCLDWAIT|0x1dc1978, (nil)}, 8) = 0 >10004 write(6, "\0", 1) = 1 >10004 sigreturn() = -1 EINTR (Interrupted system call) >10004 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 >10004 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 10005 >10004 waitpid(-1, 0xfffbeb6c, WNOHANG) = -1 ECHILD (No child processes) >10004 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 >10004 time(NULL) = 1359665510 ># here is the read() failure and then close(): >10004 read(12, 0xfffb6be0, 16384) = -1 EAGAIN (Resource temporarily unavailable) >10004 close(12) = 0 >10004 time(NULL) = 1359665510 ># Here the subshell writes to stdout for the first time: >10020 write(1, "child stdout 1\n", 15 <unfinished ...> ># Here the sshd process reads 'child stdout 1' and sends it back >across the encrypted channel to the ssh client. >10004 time(NULL) = 1359665510 >10004 read(10, "child stdout 1\n", 16384) = 15 >10004 time(NULL) = 1359665510 >10004 select(13, [3 5 7 10], [3], NULL, NULL) = 1 (out [3]) >10004 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 >10004 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 >10004 time(NULL) = 1359665510 >10004 write(3, "o\344\323\274\f\370\0v8\230m\330\310\355\363\303\255\206\366b\213\341\247\274\25\315\326H\327\232h\335"..., 64) = 64 >10004 time(NULL) = 1359665510 >10004 select(13, [3 5 7 10], [], NULL, NULL <unfinished ...> ># Here the subshell tries to write to stderr, but the pipe is closed >10020 <... write resumed> ) = 15 >10020 fcntl(1, F_GETFD) = 0 >10020 fcntl(1, F_DUPFD, 10) = 10 >10020 fcntl(1, F_GETFD) = 0 >10020 fcntl(10, F_SETFD, FD_CLOEXEC) = 0 >10020 dup2(2, 1) = 1 >10020 fcntl(2, F_GETFD) = 0 >10020 write(1, "child stderr 1\n", 15) = -1 EPIPE (Broken pipe) >10020 --- SIGPIPE (Broken pipe) @ 0 (0) --- ># After this, the shells die due to the SIGPIPE; sshd shuts down and exits. > > >This "read() returns -1 plus EAGAIN" and close() is happening inside >server_loop2(), in serverloop.c. > >I used this command to get debug tracing as well as strace data: > >sudo strace -fv -o /tmp/sshd.stderr.strace /usr/local/sbin/sshd -ddd -oPidFile=/var/tmp/sshd-24.pid -p 24 > >On the client side, I re-ran the larger ssh command, but connecting to >port 24. > >I saw the following debug output from sshd: > >debug2: channel 0: read -1 from efd 12 >debug2: channel 0: closing read-efd 12 > >-- which corresponds to > >10004 read(12, <unfinished ...> >10004 <... read resumed> 0xffad3da0, 16384) = -1 EAGAIN (Resource temporarily unavailable) >10004 write(2, "debug2: channel 0: read -1 from "..., 40) = 40 >10004 write(2, "debug2: channel 0: closing read-"..., 40) = 40 >10004 close(12) = 0 >10004 time(NULL) = 1360258197 > >(Note: the debugging + strace slows things down enough that system >calls are often unfinished/resumed.) > >The "closing read-efd" line is from channel_handle_efd(), in >channels.c; it then calls channel_close_fd(), which close()s the file >decriptor as shown in the strace above. > >The function call tree looks like this: > >server_loop2() > - collect_children() > - channel_after_select() > - channel_handler(channel_post, readset, writeset, NULL) > if (!did_init) { > - channel_handler_init() /* first run of channel_handler() only */ > - channel_handler_init_{20,13,15}(): > /* initialize callback table */ > channel_post[SSH_CHANNEL_OPEN] = &channel_post_open; > channel_post[SSH_CHANNEL_DYNAMIC] = &channel_post_open; > } > /* eventually runs this: */ > - channel_post[c->type](c, readset, writeset) > /* and if c->type is SSH_CHANNEL_OPEN or SSH_CHANNEL_DYNAMIC: */ > - channel_post_open() > - channel_handle_rfd() > - channel_handle_wfd() > - channel_handle_efd() > - process_input() > >Because of the callbacks via channel_post[], it's difficult to trace exactly >what is going on, but we can get some hints looking at >channel_handle_efd(): > >1774 } else if (c->efd != -1 && >1775 (c->extended_usage == CHAN_EXTENDED_READ || >1776 c->extended_usage == CHAN_EXTENDED_IGNORE) && >1777 (c->detach_close || FD_ISSET(c->efd, readset))) { >1778 len = read(c->efd, buf, sizeof(buf)); >1779 debug2("channel %d: read %d from efd %d", >1780 c->self, len, c->efd); >1781 if (len < 0 && (errno == EINTR || ((errno == EAGAIN || >1782 errno == EWOULDBLOCK) && !c->detach_close))) >1783 return 1; >1784 if (len <= 0) { >1785 debug2("channel %d: closing read-efd %d", >1786 c->self, c->efd); >1787 channel_close_fd(&c->efd); >1788 } else { >1789 if (c->extended_usage == CHAN_EXTENDED_IGNORE) { >1790 debug3("channel %d: discard efd", >1791 c->self); >1792 } else >1793 buffer_append(&c->extended, buf, len); >1794 } >1795 } > >Lines 1781-1783 imply to me that this should return 1 (at line 1783) >*unless* c->detach_close is non-zero. > >I added some debugging statements to channel_register_cleanup() -- >which is the only place that detach_close is set for any channel -- >and channel_handle_efd() (to show detach_close), and got the following >(my added statements are marked with "DEBUG:"): > >debug1: Received SIGCHLD. >debug1: session_by_pid: pid 1935 >debug1: session_exit_message: session 0 channel 0 pid 1935 >debug2: channel 0: request exit-status confirm 0 >debug1: session_exit_message: release channel 0 >debug2: DEBUG: channel 0, fd 12: detach_close=1 >debug2: channel 0: read -1 from efd 12 >debug2: DEBUG: fd 12: len=-1, detach_close=1, errno=11 >debug2: channel 0: closing read-efd 12 >debug1: channel_close_fd(): close(12) >debug2: notify_done: reading > >So it appears that stderr has detach_close=1 set -- but stdout does not. > >As mentioned above, Protocol 1 does not close stderr in this case, and >OpenSSH versions prior to 4.6p1 do not exhibit the bug for Protocol 2. > >Here is a subset of the diffs for channels.c between 4.5p1 and 4.6p1: > >--- openssh-4.5p1/channels.c 2006-08-29 18:07:40.000000000 -0700 >+++ openssh-4.6p1/channels.c 2007-01-28 15:16:28.000000000 -0800 >@@ -1,4 +1,4 @@ >-/* $OpenBSD: channels.c,v 1.266 2006/08/29 10:40:18 djm Exp $ */ >+/* $OpenBSD: channels.c,v 1.268 2007/01/03 03:01:40 stevesk Exp $ */ > /* > * Author: Tatu Ylonen <ylo@cs.hut.fi> > * Copyright (c) 1995 Tatu Ylonen <ylo@cs.hut.fi>, Espoo, Finland >@@ -1449,10 +1449,11 @@ > int len; > > if (c->rfd != -1 && >- FD_ISSET(c->rfd, readset)) { >+ (c->detach_close || FD_ISSET(c->rfd, readset))) { > errno = 0; > len = read(c->rfd, buf, sizeof(buf)); >- if (len < 0 && (errno == EINTR || errno == EAGAIN)) >+ if (len < 0 && (errno == EINTR || >+ (errno == EAGAIN && !(c->isatty && c->detach_close)))) > return 1; > #ifndef PTY_ZEROREAD > if (len <= 0) { >@@ -1604,11 +1605,12 @@ > c->local_consumed += len; > } > } else if (c->extended_usage == CHAN_EXTENDED_READ && >- FD_ISSET(c->efd, readset)) { >+ (c->detach_close || FD_ISSET(c->efd, readset))) { > len = read(c->efd, buf, sizeof(buf)); > debug2("channel %d: read %d from efd %d", > c->self, len, c->efd); >- if (len < 0 && (errno == EINTR || errno == EAGAIN)) >+ if (len < 0 && (errno == EINTR || >+ (errno == EAGAIN && !c->detach_close))) > return 1; > if (len <= 0) { > debug2("channel %d: closing read-efd %d", > > >So the detach_close logic was added here. Note that this affects >channel_handle_rfd() (at line 1449) and channel_handle_efd() (at line >1604), but *not* channel_handle_wfd() -- which explains why stderr is >closed but not stdout. > >I have reached the limit of the time my boss will let me spend on >this, but I hope that this gives enough information to fix the bug. > > >
You cannot view the attachment while viewing its details because your browser does not support IFRAMEs.
View the attachment on a separate page
.
View Attachment As Raw
Actions:
View
Attachments on
bug 2071
: 2220 |
3240