If I'm using ControlMaster, after the `ssh` client goes away, the `sshd` closes the STDOUT and STDIN pipes for the child, but not STDERR. If the child is writing to STDERR, this can result in it hanging and being effectively orphaned until `sshd` terminates. How to reproduce: # In one terminal, start the ControlMaster ssh -oControlMaster=yes -oControlPath=$HOME/example.sock example.com # In another terminal, start a process using that ControlPath that writes to STDERR: ssh -oControlMaster=no -oControlPath=$HOME/example.sock example.com 'echo i am pid $$; exec yes >&2' 2> /dev/null i am pid 8176 # Hit ^C to disconnect the client # Now, back on the server, observe that 8088 is still running: $ ps -f 8176 UID PID PPID C STIME TTY STAT TIME CMD nelhage 8176 7802 0 16:37 ? Ss 0:00 yes # And is blocked on STDERR: $ strace -p 8176 strace: Process 8176 attached write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192 # If we look at its fd's, and at sshd's fds: $ ls -l /proc/8176/fd/ total 0 lr-x------ 1 nelhage nelhage 64 May 1 16:37 0 -> pipe:[26685784] l-wx------ 1 nelhage nelhage 64 May 1 16:37 1 -> pipe:[26685786] l-wx------ 1 nelhage nelhage 64 May 1 16:37 2 -> pipe:[26685786] $ sudo ls -l /proc/7802/fd/ total 0 lrwx------ 1 root root 64 May 1 16:39 0 -> /dev/null lrwx------ 1 root root 64 May 1 16:39 1 -> /dev/null lrwx------ 1 root root 64 May 1 16:39 11 -> /dev/ptmx lrwx------ 1 root root 64 May 1 16:39 12 -> /dev/ptmx lr-x------ 1 root root 64 May 1 16:39 16 -> pipe:[26685500] lr-x------ 1 root root 64 May 1 16:39 17 -> pipe:[26685511] lr-x------ 1 root root 64 May 1 16:39 18 -> pipe:[26685786] lrwx------ 1 root root 64 May 1 16:39 2 -> /dev/null lrwx------ 1 root root 64 May 1 16:39 3 -> socket:[26680195] lrwx------ 1 root root 64 May 1 16:39 4 -> socket:[26684902] lrwx------ 1 root root 64 May 1 16:39 5 -> socket:[26684932] lr-x------ 1 root root 64 May 1 16:39 6 -> pipe:[26680934] l-wx------ 1 root root 64 May 1 16:39 7 -> /run/systemd/sessions/8548.ref l-wx------ 1 root root 64 May 1 16:39 8 -> pipe:[26680934] lrwx------ 1 root root 64 May 1 16:39 9 -> /dev/ptmx # We can see that STDERR (pipe:[26685786]) is still open in the ssh daemon, but STDOUT and STDIN (…84 and …85) were closed. # # The lingering `yes` process is now hung and stuck until the original ssh session finally exits.
I looked into this with Nelson a bit - it looks like this is the /* XXX */ in channel_pre_open in channel.c. That function appears to properly handle stdin and stdout, but stderr is only select()ed on if stdin and stdout are open, which seems wrong. I also don't see anything that ever closes stderr, other than the process exiting. This might be a client-side thing (client not sending a shutdown for stderr) but I'm not following the logic of nchan.c well enough to see who should be handling stderr.
Are you really using 6.6p1? If so, please try to replicate this with the current version. There's four years of fixes since 6.6 was released.
I first diagnosed this on Ubuntu Trusty, which is in fact 6.6p1. I was just able to reproduce on 7.2p1; I'll try to spin up an even newer version to test there in a bit.
Reproduced on 7.6p1.
Created attachment 3182 [details] close efd when shutting down read/write I *think* this is the correct solution, but I'm not 100% sure yet that it won't cause data loss around EOF.
Created attachment 3185 [details] close read-mode efd when channel close received That first patch was definitely wrong - it broke our current stderr handling tests. This one is a bit more delicate: close the extended fd when it is in read mode (i.e. in sshd and attached to read a subprocess' stderr) and the channel receives a close message.
I've committed this fix and it should be in OpenSSH 7.9 - thanks! commit e0d6501e86734c48c8c503f81e1c0926e98c5c4c (HEAD -> master, origin/master, origin/HEAD) Author: djm@openbsd.org <djm@openbsd.org> Date: Thu Oct 4 07:47:35 2018 +0000 upstream: when the peer sends a channel-close message, make sure we close the local extended read fd (stderr) along with the regular read fd (stdout). Avoids weird stuck processed in multiplexing mode. Report and analysis by Nelson Elhage and Geoffrey Thomas in bz#2863 ok dtucker@ markus@ OpenBSD-Commit-ID: a48a2467fe938de4de69d2e7193d5fa701f12ae9
Close RESOLVED bugs with the release of openssh-8.0