| Summary: | Race condition in receiving SIGTERM | ||
|---|---|---|---|
| Product: | Portable OpenSSH | Reporter: | Ben Maurer <ben.maurer> |
| Component: | sshd | Assignee: | Assigned to nobody <unassigned-bugs> |
| Status: | CLOSED FIXED | ||
| Severity: | minor | CC: | djm, dtucker, tcunha |
| Priority: | P5 | ||
| Version: | 6.2p1 | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Bug Depends on: | |||
| Bug Blocks: | 3302 | ||
| Attachments: | |||
Some potential strategies that come to mind to prevent this: (1) handle the termination in the signal handler. The signal handler doesn't appear to use malloc, it just closes the listening ports and removes the PID file. This could be done inside the signal handler (2) Create a pipe which signal handlers can write to in order to wake up the select loop. Retarget incomplete bugs to 6.8 release. These bugs are no longer targeted at the imminent 6.7 release OpenSSH 6.8 is approaching release and closed for major work. Retarget these bugs for the next release. Retarget to 6.9 Retarget to 7.0 release, we'll probably add a notification fd Retarget pending bugs to openssh-7.1 Retarget to openssh-7.3 Retarget to openssh-7.3 retarget unfinished bugs to next release retarget unfinished bugs to next release retarget unfinished bugs to next release retarget unfinished bugs to next release OpenSSH 7.4 release is closing; punt the bugs to 7.5 Move incomplete bugs to openssh-7.6 target since 7.5 shipped a while back. To calibrate expectations, there's little chance all of these are going to make 7.6. remove 7.5 target (3) mask the signals and use pselect instead of select? Created attachment 3023 [details]
Mask sigterm and replace select with pselect in server_accept_loop
Move to OpenSSH 7.8 tracking bug Retarget remaining bugs planned for 7.8 release to 7.9 Retarget remaining bugs planned for 7.8 release to 7.9 Retarget unfinished bugs to OpenSSH 8.0 Retarget unfinished bugs to OpenSSH 8.0 Retarget unfinished bugs to OpenSSH 8.0 Retarget outstanding bugs at next release Retarget these bugs to 8.2 release Prepare for 8.2 release; retarget bugs Retarget bugs to 8.4 release retarget to 8.6 retarget after 8.6p1 release Created attachment 3520 [details]
use pselect in server_accept_loop and wait_until_can_do_something
Created attachment 3523 [details]
use pselect in server_accept_loop and wait_until_can_do_something
Previous patch had some problems (eg it broke SIGINT in the user's shell). This one seems to work OK so far.
This was committed in https://github.com/openssh/openssh-portable/commit/771f57a8626709f2ad207058efd68fbf30d31553 and will be in the next major release. thanks for the report. closing bugs resolved before openssh-8.9 |
To handle sigterm, openssh uses this handler: static void sigterm_handler(int sig) { received_sigterm = sig; } in the select loop, it checks this flag ret = select(maxfd+1, fdset, NULL, NULL, NULL); ... if (received_sigterm) { select() will return -1 with an EINTR when it gets a signal. Therefore, in most cases this successfully shuts down the process. However, if SSH were executing something other than this select call (eg, accepting a new connection) it would never notice the sigterm until a new event came in. This created a race condition in a large, real world deployment. The default init script in the openssh package sends a SIGTERM in order to kill the process. On a small fraction of servers, the race condition mentioned here occurred. The new openssh process was launched while the old one still ran. When the new process attempted to bind() to a port, it failed.