Bug 1692 - sshd sometimes dies when sent multiple SIGHUPs in quick succession
Summary: sshd sometimes dies when sent multiple SIGHUPs in quick succession
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 5.3p1
Hardware: Other Linux
: P2 normal
Assignee: Assigned to nobody
URL: https://bugs.launchpad.net/ubuntu/+so...
Keywords:
Depends on:
Blocks: V_5_4
  Show dependency treegraph
 
Reported: 2010-01-06 01:41 AEDT by Colin Watson
Modified: 2010-03-26 10:51 AEDT (History)
1 user (show)

See Also:


Attachments
ignore SIGHUP across sshd re-exec window (362 bytes, patch)
2010-01-06 01:41 AEDT, Colin Watson
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Colin Watson 2010-01-06 01:41:11 AEDT
Created attachment 1766 [details]
ignore SIGHUP across sshd re-exec window

In Ubuntu bug #497781, "PierreF" reported that it's sometimes possible to end up with no sshd running if you send it two SIGHUP signals in quick succession, which can sometimes happen due to configuration of networking scripts. Although I haven't been able to reproduce this myself so far, I think this is because SIGHUP is reset to the default action by execve(), and sshd's handler isn't reinstalled until shortly after the exec, so there's a window when it's simply set to the default action of terminating the process.

If this hypothesis is correct, which I think is likely, then the attached patch should fix it by ignoring SIGHUP across the exec window.
Comment 1 Darren Tucker 2010-01-06 10:33:36 AEDT
Seems like a reasonable hypothesis, but I don't see the patch making any difference.

The execv will result in an entirely new process address space (including address layout randomization on platforms that have it) and the disposition of the old process' signal handlers will be irrelevant.  You'd still have a window until the signal handler is reinstalled where the default action of SIGHUP would kill sshd.

You could minimize this window by moving the "signal(SIGCHLD, main_sigchld_handler)" to the start of main().  This wouldn't eliminate the window but it would shrink it a lot (particularly because the generation of the protocol 1 ephemeral server key would no longer be in the window).
Comment 2 Colin Watson 2010-01-06 12:18:01 AEDT
That isn't how execve() works, though. Signal dispositions are inherited, not reset by the action of loading the new process image. Lots of things wouldn't work if things were the way you posit - for example, nohup(1) would be entirely non-functional.

Here's a transcript of a test demonstrating that ignoring SIGHUP before execve() is effective. Does this convince you?

$ cat t.c
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
typedef void (*sighandler_t)(int);
extern char **environ;
int main(int argc, char **argv) {
    if (getenv("SECOND_TIME")) {
        sighandler_t prev = signal(SIGHUP, SIG_DFL);
        if (prev == SIG_IGN) {
            printf("SIGHUP was SIG_IGN\n");
        } else {
            printf("SIGHUP was not SIG_IGN\n");
        }
        exit(0);
    } else {
        setenv("SECOND_TIME", "1", 1);
        if (getenv("IGNORE_SIGHUP"))
            signal(SIGHUP, SIG_IGN);
        execve(argv[0], argv, environ);
    }
}
$ make t
cc     t.c   -o t
$ ./t
SIGHUP was not SIG_IGN
$ IGNORE_SIGHUP=1 ./t
SIGHUP was SIG_IGN
Comment 3 Colin Watson 2010-01-06 12:19:36 AEDT
BTW, I agree that ASLR makes a difference when the signal handler is a function, but in this case that is not so. SIG_IGN is (typically; certainly on Linux, I imagine on other Unixes too) just a constant and not affected by ASLR.
Comment 4 Colin Watson 2010-01-06 12:21:27 AEDT
I notice that the original reporter of https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/497781 confirmed in a comment on that bug that my patch appears to fix the problem for him.
Comment 5 Darren Tucker 2010-01-09 21:01:02 AEDT
Fair enough, I've never actually thought about how nohup works :-)  

Adding to the list for 5.4.
Comment 6 Darren Tucker 2010-01-09 22:19:41 AEDT
Patch applied, thanks.  It will be in the 5.4 release.
Comment 7 Darren Tucker 2010-03-26 10:51:34 AEDT
With the release of 5.4p1, this bug is now considered closed.