Bug 2646 - zombie processes when using privilege separation
Summary: zombie processes when using privilege separation
Status: CLOSED WORKSFORME
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 7.2p2
Hardware: ix86 Linux
: P5 minor
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-12-14 06:44 AEDT by Akshay
Modified: 2018-04-06 12:26 AEST (History)
3 users (show)

See Also:


Attachments
Add sigchld handler to inetd mode path (347 bytes, patch)
2016-12-15 10:37 AEDT, Darren Tucker
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Akshay 2016-12-14 06:44:10 AEDT
I'm using `OpenSSH_7.2p2 Ubuntu-4ubuntu1, OpenSSL 1.0.2g-fips` and I've explicitly enabled UsePrivilegeSeparation.

With this I notice that the [priv] process does not get reaped by its parent (sshd) and as a result is adopted by whatever pid 1 happens to be. Normally this is okay since most init systems will handle this correctly, however in containers we might encounter homemade "init" systems that only serve to propagate signals but don't reap adopted zombie processes. In such cases we accumulate these zombies over time and can lead to obvious problems.

Is there any reason that sshd can't reap its children after they exit?
Comment 1 Akshay 2016-12-14 06:49:27 AEDT
Steps to reproduce the issue:

- using a docker container running phusion/baseimage:latest.
- modify sshd_config to explicitly enable UsePrivilegeSeparation
- start sshd
- trace the init process in the container
- ssh into the container, then exit
- notice that the init process ends up 'wait'ing for the zombied sshd

Alternatively

- hack up a 'init' process that simply launches sshd in the container
- log in , log out
- notice `ps auxf` listing in the container now has zombie ssh process
Comment 2 Darren Tucker 2016-12-14 08:39:54 AEDT
(In reply to Akshay from comment #0)
> I'm using `OpenSSH_7.2p2 Ubuntu-4ubuntu1, OpenSSL 1.0.2g-fips` and

That's a vendor-modified version of OpenSSH.  Can you reproduce the problem with a binary built from the stock sources from openssh.com?  What command line flags is sshd invoked with?

> Is there any reason that sshd can't reap its children after they
> exit?

It does (or at least it should): https://anongit.mindrot.org/openssh.git/tree/sshd.c#n317
Comment 3 Akshay 2016-12-15 09:51:41 AEDT
> Can you reproduce the problem with a binary built from the stock sources from openssh.com

Sure, I'll go ahead and do that

> What command line flags is sshd invoked with

I'll provide those as well
Comment 4 Akshay 2016-12-15 09:58:45 AEDT
Okay, I was able to reproduce the issue using `OpenSSH_7.2p2, OpenSSL 1.0.2g  1 Mar 2016`

First, I have a simple 'init' program that runs in a container. All it does is it launches sshd, and waits for the TERM signal. On receipt of TERM, it TERMs sshd, and exits.

So, initially, here is what I see:

root@4871a0e3589e:/# ps auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         9  0.0  0.0  18248  3384 ?        Ss   22:47   0:00 bash
root        19  0.0  0.0  34424  2820 ?        R+   22:48   0:00  \_ ps auxf
root         1  0.4  0.0  40364  8220 ?        Ssl+ 22:47   0:00 /usr/bin/ruby -- /init.rb
root         8  0.0  0.0  26468  3844 ?        S+   22:47   0:00 /usr/sbin/sshd -D

The bash process (that spawns ps) is 'exec'd in the container using docker exec so that I can view the process listing "out-of-band" (i.e without exercising sshd)

Next, I log in, and list the processes (in-band, this time). This is what i see:

nsadmin@4871a0e3589e:~$ ps auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  40364  8220 ?        Ssl+ 22:47   0:00 /usr/bin/ruby -- /init.rb
root         8  0.0  0.0  26468  3844 ?        S+   22:47   0:00 /usr/sbin/sshd -D
root        20  0.0  0.0  29028  4532 ?        Ss   22:48   0:00  \_ sshd: nsadmin [priv]
nsadmin     22  0.0  0.0  29028  2624 ?        S    22:48   0:00      \_ sshd: nsadmin@pts/0
nsadmin     23  0.0  0.0  18256  3216 pts/0    Ss   22:48   0:00          \_ -bash
nsadmin     28  0.0  0.0  34424  2932 pts/0    R+   22:48   0:00              \_ ps auxf


Then, I log out of the ssh session, and get the process listing using an exec'd shell:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root        29  0.0  0.0  18248  3264 ?        Ss   22:48   0:00 /bin/bash
root        40  0.0  0.0  34424  2876 ?        R+   22:48   0:00  \_ ps auxf
root         1  0.0  0.0  40364  8220 ?        Ssl+ 22:47   0:00 /usr/bin/ruby -- /init.rb
root         8  0.0  0.0  26468  3844 ?        S+   22:47   0:00 /usr/sbin/sshd -D
nsadmin     22  0.0  0.0      0     0 ?        Z    22:48   0:00 [sshd] <defunct>
Comment 5 Darren Tucker 2016-12-15 10:32:39 AEDT
(In reply to Akshay from comment #4)
> Okay, I was able to reproduce the issue using `OpenSSH_7.2p2,
> OpenSSL 1.0.2g  1 Mar 2016`


Thanks.

> nsadmin     22  0.0  0.0      0     0 ?        Z    22:48   0:00
> [sshd] <defunct>

If I'm reading this correctly that's the post-auth unprivileged process (pid 22 in this example) not the [priv] process (pid 20 in this example).

I think I can see how this would happen.  After accepting the connection and forking off a copy, sshd re-execs itself with the "-R" flag in order to (hopefully) get a new address space layout.  -R sets:

                case 'R':
                        rexeced_flag = 1;
                        inetd_flag = 1;

then a bit later when the signal handlers are set up:
        /* Get a connection, either from inetd or a listening TCP socket */
        if (inetd_flag) {
                server_accept_inetd(&sock_in, &sock_out);
        } else {
[...]
                signal(SIGCHLD, main_sigchld_handler);

You can test this theory by running your sshd with the (undocumented) "-r" option to disable the re-exec.
Comment 6 Darren Tucker 2016-12-15 10:37:41 AEDT
Created attachment 2914 [details]
Add sigchld handler to inetd mode path

I think this patch would also fix it.  Could you please try it?
Comment 7 Akshay 2016-12-15 12:16:06 AEDT
Here is what happened when I tested with the '-r' option:

Initially...

root@4871a0e3589e:/# ps auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         9  0.0  0.0  18248  3308 ?        Ss   01:14   0:00 /bin/bash
root        27  0.0  0.0  34424  2908 ?        R+   01:14   0:00  \_ ps auxf
root         1  0.1  0.0  40356  8196 ?        Ssl+ 01:14   0:00 /usr/bin/ruby -- /init.rb
root         8  0.0  0.0  26468  3772 ?        S+   01:14   0:00 /usr/sbin/sshd -D -r
root        19  0.0  0.0  29028  4084 ?        Ss   01:14   0:00  \_ sshd: nsadmin [priv]
nsadmin     21  0.0  0.0  29028  2668 ?        S    01:14   0:00      \_ sshd: nsadmin@pts/0
nsadmin     22  0.0  0.0  18252  3204 pts/0    Ss+  01:14   0:00          \_ -bash

Later, (after login then logout)...

root@4871a0e3589e:/# ps auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         9  0.0  0.0  18248  3324 ?        Ss   01:14   0:00 /bin/bash
root        29  0.0  0.0  34424  2824 ?        R+   01:14   0:00  \_ ps auxf
root         1  0.1  0.0  40356  8196 ?        Ssl+ 01:14   0:00 /usr/bin/ruby -- /init.rb
root         8  0.0  0.0  26468  3772 ?        S+   01:14   0:00 /usr/sbin/sshd -D -r
nsadmin     21  0.0  0.0      0     0 ?        Z    01:14   0:00 [sshd] <defunct>
Comment 8 Akshay 2016-12-15 18:57:57 AEDT
Also, adding the one line patch you suggested (on to 7.2p2*) does not fix the problem. I still see processes marked 'defunct' once I log out.

* = your patch was probably on a different branch, because the line nos didnt seem to align. I was able to find the appropriate line using the comment above it
Comment 9 Damien Miller 2017-01-06 14:24:49 AEDT
(In reply to Akshay from comment #7)

I think this is a bug in your init program. We could probably tell more clearly if you include PPID in your process lists (e.g. "ps ajf").

Here are is the process list from when the session is active:

> root@4871a0e3589e:/# ps auxf
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> COMMAND
> root         8  0.0  0.0  26468  3772 ?        S+   01:14   0:00
> /usr/sbin/sshd -D -r

^^ this sshd process (pid=8) is listening to the network.

> root        19  0.0  0.0  29028  4084 ?        Ss   01:14   0:00  \_
> sshd: nsadmin [priv]

^^ this one (pid=19) is the privilege separation monitor process.

> nsadmin     21  0.0  0.0  29028  2668 ?        S    01:14   0:00    
> \_ sshd: nsadmin@pts/0

^^ this one is the low-privilege child process.

> Later, (after login then logout)...
> 
> root@4871a0e3589e:/# ps auxf
> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
> COMMAND
> root         8  0.0  0.0  26468  3772 ?        S+   01:14   0:00
> /usr/sbin/sshd -D -r

^^ the listener process is still here.

> nsadmin     21  0.0  0.0      0     0 ?        Z    01:14   0:00
> [sshd] <defunct>

This process was previously a child of the monitor process on pid=19, but its parent has already exited, so it's not around to call waitpid() to reap it.

In this situation, init is supposed to do the reaping since pid=21 is clearly orphaned. See https://en.wikipedia.org/wiki/Zombie_process for a bit more detail on how this is supposed to flow.

This might be your problem: https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/
Comment 10 Akshay 2017-01-07 08:23:54 AEDT
> In this situation, init is supposed to do the reaping

I understand that this is how normal systems might work. 

But as I mentioned in comment-1...

> does not get reaped by its parent (sshd) and as a result is adopted by whatever pid 1 happens to be. Normally this is okay since most init systems will handle this correctly, however in containers we might encounter homemade "init" systems that only serve to propagate signals but don't reap adopted zombie processes. In such cases we accumulate these zombies over time and can lead to obvious problems.

Is there any reason that sshd can't reap its children after they exit?

So the original intent of filing the bug was to find out if sshd behavior could be changed so that all parents are around long enough to reap the children and then exit, thereby leaving no zombies.
Comment 11 Akshay 2017-01-07 08:25:46 AEDT
> Is there any reason that sshd can't reap its children after they exit?

To be specific, I meant to ask if there is a reason the priv sep process doesn't wait around till its children exit.
Comment 12 Damien Miller 2017-02-03 15:25:00 AEDT
I don't want to add code to sshd to workaround broken init systems. init behaviour is basic system functionality that we shouldn't have to kludge around.
Comment 13 Damien Miller 2018-04-06 12:26:30 AEST
Close all resolved bugs after release of OpenSSH 7.7.