I've attempted to run openssh-SNAP-20130802.tar.gz on FreeBSD 9.1-STABLE FreeBSD 9.1-STABLE #2 r251997: Wed Jun 19 11:13:25 PDT 2013 /usr/obj/usr/src/sys/GENERIC amd64 While "make tests" pass successfully, I did stumble upon, potentially not a new, bug related to fallback re-exec functionality bug. Here is how to reproduce it. $ pwd /tmp/todelete/openssh $ make tests $ cd regress $ /tmp/todelete/openssh/regress/sshd -f /tmp/todelete/openssh/regress/sshd_config -Esshd.log -dddD # in a separate window: $ pwd /tmp/todelete/openssh/regress $ mv sshd sshd.o # simulate reexec fallback $ ../ssh -nqo "Protocol=2" -F ssh_config somehost ls Connection to 127.0.0.1 closed by remote host. $ echo $? 255 I haven't been able to figure out exactly what's causing the failure, but it appears to be something about assumption about which fd values. I.e. if we ensure that FD to which stderr is redirected (-E), gets a higher than usual fd, then the problem goes away: --- log.c.orig 2013-08-02 15:27:27.000000000 -0700 +++ log.c 2013-08-02 15:28:23.000000000 -0700 @@ -352,11 +352,20 @@ { int fd; +#define XSIZ 3 + int f[XSIZ], i; + /* ensure fd gets a higher numbered fd */ + for (i = 0; i < XSIZ; i++) + f[i] = open("/COPYRIGHT", O_RDONLY); + if ((fd = open(logfile, O_WRONLY|O_CREAT|O_APPEND, 0600)) == -1) { fprintf(stderr, "Couldn't open logfile %s: %s\n", logfile, strerror(errno)); exit(1); } + for (i = 0; i < XSIZ; i++) + close(f[i]); + log_stderr_fd = fd; }
I have some more details: Apparently, startup_pipe gets the same fd value as connection_in/connection_out. 2094 authenticated: 2095 /* 2096 * Cancel the alarm we set to limit the time taken for 2097 * authentication. 2098 */ 2099 alarm(0); 2100 signal(SIGALRM, SIG_DFL); 2101 authctxt->authenticated = 1; 2102 if (startup_pipe != -1) { 2103 close(startup_pipe); 2104 startup_pipe = -1; 2105 } So by closing(startup_pipe) on line 2103, we also inadvertently close connection_in/connection_out fd. Which causes the bug. --- sshd.c.orig 2013-08-02 19:40:58.000000000 -0700 +++ sshd.c 2013-08-02 19:41:01.000000000 -0700 @@ -2100,7 +2100,7 @@ signal(SIGALRM, SIG_DFL); authctxt->authenticated = 1; if (startup_pipe != -1) { - close(startup_pipe); + //close(startup_pipe); startup_pipe = -1; } This prevents the problem from happening, but likely leaks the fd.. I need to futher look in to how startup_pipe is supposed to work and how to properly fix it. Thanks
I can't replicate this with -current on OpenBSD, but I think I can see how it happens.
Created attachment 2339 [details] Reset startup_pipe after dup2'ing Please try this diff.
Looks like this diff solves the symptom I've reported. Thanks
Hang on - the patch hasn't been committed yet. We'll close the bug once it is in.
Created attachment 2347 [details] Revised diff Revised diff - this closes the original startup_pipe after checking it hasn't already landed at the fd number we want to inhabit. Could you please test this one?
Revised patch seems to work. I.e., my testcase is no longer reproducible. Thanks
Patch committed - this will be in OpenSSH-6.4. Thanks!
Close all resolved bugs after 7.3p1 release