Bug 2953 - Race during daemon reload may cause to fail to listen on configured ports
Summary: Race during daemon reload may cause to fail to listen on configured ports
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 7.2p2
Hardware: amd64 Linux
: P5 minor
Assignee: Damien Miller
URL:
Keywords:
Depends on:
Blocks: V_8_0
  Show dependency treegraph
 
Reported: 2019-01-10 00:46 AEDT by Michal Koutný
Modified: 2023-01-13 13:42 AEDT (History)
3 users (show)

See Also:


Attachments
Prevent restarting while children are listening (6.51 KB, patch)
2019-01-10 00:46 AEDT, Michal Koutný
no flags Details | Diff
reuse startup_pipes; fix race with rexeced children too (6.47 KB, patch)
2019-02-22 16:53 AEDT, Damien Miller
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Michal Koutný 2019-01-10 00:46:32 AEDT
Created attachment 3222 [details]
Prevent restarting while children are listening

This was observed on 7.2p2 on SLES 12 SP2 (4.4.121-92.98-default) but based
on the code review, I expect the current version is affected too.

There is a short window when forked children are still referencing the listen
socket after a new client connects. When the parent handles SIGHUP and reloads
itself, it may fail to bind the new sockets if it hits this window.

Linux manual page socket(7) on SO_REUSEADDR:
> For AF_INET sockets this means that a socket may bind, except when
> there is an active listening socket bound to the address.

I'm attaching a patch (against master, 4a526941) that tackles this (verified on
reproducer on the system mentioned above).
Comment 1 Jakub Jelen 2019-01-10 01:31:26 AEDT
I confirm that this happens also with our systems when they are under heavy load. See following bug for simple reproducer

https://bugzilla.redhat.com/show_bug.cgi?id=1643176

This was temporarily resolved by configuring systemd service to automatically restart the sshd if it failed, but I agree that this sounds like a better idea except for the issue with blocking for undefined time for unresponsive (for whatever reason) children.
Comment 2 Michal Koutný 2019-01-10 02:22:05 AEDT
(In reply to Jakub Jelen from comment #1)
> [...] except for the issue with blocking for undefined time for unresponsive
> (for whatever reason) children.
If the children were unresponsive, it'd make the sshd server eventually
unresponsive nowadays too (because of MaxStartups). Or are you referring to SIGHUP responsiveness?
That's indeed harmed with this but should the child have issues reaching close_listen_socks() there are perhaps more severe issues with the system (IMO).
Comment 3 Damien Miller 2019-02-22 16:53:31 AEDT
Created attachment 3245 [details]
reuse startup_pipes; fix race with rexeced children too

I don't like the complexity of having another set of fds just for handling the listen_socks. Instead, this reuses the existing startup_pipes for signalling, but writing a char over them when the child process has finished "early startup".

"early startup" includes closing listen_socks but also, for the rexec case, receiving the re-executed child's state from sshd. This avoids another race condition: at SIGHUP the listener sshd was exiting before the child received its configuration.
Comment 4 Damien Miller 2019-03-01 13:34:39 AEDT
This patch (with a couple of tweaks) has been committed as 76a24b3fa193a and will be in openssh-8.0. Thanks!

commit 76a24b3fa193a9ca3e47a8779d497cb06500798b (HEAD -> master, origin/master, origin/HEAD)
Author: djm@openbsd.org <djm@openbsd.org>
Date:   Fri Mar 1 02:32:39 2019 +0000

    upstream: Fix two race conditions in sshd relating to SIGHUP:
    
    1. Recently-forked child processes will briefly remain listening to
      listen_socks. If the main server sshd process completes its restart
      via execv() before these sockets are closed by the child processes
      then it can fail to listen at the desired addresses/ports and/or
      fail to restart.
    
    2. When a SIGHUP is received, there may be forked child processes that
      are awaiting their reexecution state. If the main server sshd
      process restarts before passing this state, these child processes
      will yield errors and use a fallback path of reading the current
      sshd_config from the filesystem rather than use the one that sshd
      was started with.
    
    To fix both of these cases, we reuse the startup_pipes that are shared
    between the main server sshd and forked children. Previously this was
    used solely to implement tracking of pre-auth child processes for
    MaxStartups, but this extends the messaging over these pipes to include
    a child->parent message that the parent process is safe to restart. This
    message is sent from the child after it has completed its preliminaries:
    closing listen_socks and receiving its reexec state.
    
    bz#2953, reported by Michal Koutný; ok markus@ dtucker@
    
    OpenBSD-Commit-ID: 7df09eacfa3ce13e9a7b1e9f17276ecc924d65ab
Comment 5 Damien Miller 2019-10-09 15:11:44 AEDT
Close bugs fixed in openssh-8.1 release cycle