Bug 2863 - sshd with ControlMaster does not close child STDERR on client exit
Summary: sshd with ControlMaster does not close child STDERR on client exit
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 7.6p1
Hardware: amd64 Linux
: P5 minor
Assignee: Damien Miller
URL:
Keywords:
Depends on:
Blocks: V_7_9
  Show dependency treegraph
 
Reported: 2018-05-02 09:40 AEST by Nelson Elhage
Modified: 2018-10-19 17:17 AEDT (History)
2 users (show)

See Also:


Attachments
close efd when shutting down read/write (6.29 KB, patch)
2018-10-03 18:30 AEST, Damien Miller
no flags Details | Diff
close read-mode efd when channel close received (1.81 KB, patch)
2018-10-04 11:21 AEST, Damien Miller
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Nelson Elhage 2018-05-02 09:40:48 AEST
If I'm using ControlMaster, after the `ssh` client goes away, the `sshd` closes the STDOUT and STDIN pipes for the child, but not STDERR. If the child is writing to STDERR, this can result in it hanging and being effectively orphaned until `sshd` terminates.

How to reproduce:

# In one terminal, start the ControlMaster
ssh -oControlMaster=yes -oControlPath=$HOME/example.sock example.com

# In another terminal, start a process using that ControlPath that writes to STDERR:
ssh -oControlMaster=no -oControlPath=$HOME/example.sock example.com 'echo i am pid $$; exec yes >&2' 2> /dev/null
i am pid 8176
# Hit ^C to disconnect the client
# Now, back on the server, observe that 8088 is still running:
$ ps -f 8176
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
nelhage   8176  7802  0 16:37 ?        Ss     0:00 yes

# And is blocked on STDERR:

$ strace -p 8176
strace: Process 8176 attached
write(1, "y\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\ny\n"..., 8192


# If we look at its fd's, and at sshd's fds:

$ ls -l /proc/8176/fd/
total 0
lr-x------ 1 nelhage nelhage 64 May  1 16:37 0 -> pipe:[26685784]
l-wx------ 1 nelhage nelhage 64 May  1 16:37 1 -> pipe:[26685786]
l-wx------ 1 nelhage nelhage 64 May  1 16:37 2 -> pipe:[26685786]
$ sudo ls -l /proc/7802/fd/
total 0
lrwx------ 1 root root 64 May  1 16:39 0 -> /dev/null
lrwx------ 1 root root 64 May  1 16:39 1 -> /dev/null
lrwx------ 1 root root 64 May  1 16:39 11 -> /dev/ptmx
lrwx------ 1 root root 64 May  1 16:39 12 -> /dev/ptmx
lr-x------ 1 root root 64 May  1 16:39 16 -> pipe:[26685500]
lr-x------ 1 root root 64 May  1 16:39 17 -> pipe:[26685511]
lr-x------ 1 root root 64 May  1 16:39 18 -> pipe:[26685786]
lrwx------ 1 root root 64 May  1 16:39 2 -> /dev/null
lrwx------ 1 root root 64 May  1 16:39 3 -> socket:[26680195]
lrwx------ 1 root root 64 May  1 16:39 4 -> socket:[26684902]
lrwx------ 1 root root 64 May  1 16:39 5 -> socket:[26684932]
lr-x------ 1 root root 64 May  1 16:39 6 -> pipe:[26680934]
l-wx------ 1 root root 64 May  1 16:39 7 -> /run/systemd/sessions/8548.ref
l-wx------ 1 root root 64 May  1 16:39 8 -> pipe:[26680934]
lrwx------ 1 root root 64 May  1 16:39 9 -> /dev/ptmx

# We can see that STDERR (pipe:[26685786]) is still open in the ssh daemon, but STDOUT and STDIN (…84 and …85) were closed.
#
# The lingering `yes` process is now hung and stuck until the original ssh session finally exits.
Comment 1 Geoffrey Thomas 2018-05-02 09:55:34 AEST
I looked into this with Nelson a bit - it looks like this is the /* XXX */ in channel_pre_open in channel.c. That function appears to properly handle stdin and stdout, but stderr is only select()ed on if stdin and stdout are open, which seems wrong. I also don't see anything that ever closes stderr, other than the process exiting.

This might be a client-side thing (client not sending a shutdown for stderr) but I'm not following the logic of nchan.c  well enough to see who should be handling stderr.
Comment 2 Damien Miller 2018-05-03 01:48:43 AEST
Are you really using 6.6p1? If so, please try to replicate this with the current version. There's four years of fixes since 6.6 was released.
Comment 3 Nelson Elhage 2018-05-03 01:53:40 AEST
I first diagnosed this on Ubuntu Trusty, which is in fact 6.6p1. I was just able to reproduce on 7.2p1; I'll try to spin up an even newer 
 version to test there in a bit.
Comment 4 Nelson Elhage 2018-05-03 07:07:49 AEST
Reproduced on 7.6p1.
Comment 5 Damien Miller 2018-10-03 18:30:03 AEST
Created attachment 3182 [details]
close efd when shutting down read/write

I *think* this is the correct solution, but I'm not 100% sure yet that it won't cause data loss around EOF.
Comment 6 Damien Miller 2018-10-04 11:21:06 AEST
Created attachment 3185 [details]
close read-mode efd when channel close received

That first patch was definitely wrong - it broke our current stderr handling tests.

This one is a bit more delicate: close the extended fd when it is in read mode (i.e. in sshd and attached to read a subprocess' stderr) and the channel receives a close message.
Comment 7 Damien Miller 2018-10-04 17:51:28 AEST
I've committed this fix and it should be in OpenSSH 7.9 - thanks!

commit e0d6501e86734c48c8c503f81e1c0926e98c5c4c (HEAD -> master, origin/master, origin/HEAD)
Author: djm@openbsd.org <djm@openbsd.org>
Date:   Thu Oct 4 07:47:35 2018 +0000

    upstream: when the peer sends a channel-close message, make sure we
    
    close the local extended read fd (stderr) along with the regular read fd
    (stdout). Avoids weird stuck processed in multiplexing mode.
    
    Report and analysis by Nelson Elhage and Geoffrey Thomas in bz#2863
    
    ok dtucker@ markus@
    
    OpenBSD-Commit-ID: a48a2467fe938de4de69d2e7193d5fa701f12ae9
Comment 8 Damien Miller 2018-10-19 17:17:28 AEDT
Close RESOLVED bugs with the release of openssh-8.0