Bug 2139 - re-exec fallback problem
Summary: re-exec fallback problem
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: -current
Hardware: Other FreeBSD
: P5 minor
Assignee: Damien Miller
URL:
Keywords:
Depends on:
Blocks: V_6_5
  Show dependency treegraph
 
Reported: 2013-08-03 08:42 AEST by Arthur Mesh
Modified: 2016-08-02 10:42 AEST (History)
1 user (show)

See Also:


Attachments
Reset startup_pipe after dup2'ing (997 bytes, patch)
2013-09-19 10:40 AEST, Damien Miller
no flags Details | Diff
Revised diff (1.48 KB, patch)
2013-10-10 12:07 AEDT, Damien Miller
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Arthur Mesh 2013-08-03 08:42:58 AEST
I've attempted to run openssh-SNAP-20130802.tar.gz on

FreeBSD 9.1-STABLE FreeBSD 9.1-STABLE #2 r251997: Wed Jun
19 11:13:25 PDT 2013     /usr/obj/usr/src/sys/GENERIC amd64

While "make tests" pass successfully, I did stumble upon, potentially
not a new, bug related to fallback re-exec functionality bug.

Here is how to reproduce it.

$ pwd
/tmp/todelete/openssh
$ make tests
$ cd regress
$ /tmp/todelete/openssh/regress/sshd -f /tmp/todelete/openssh/regress/sshd_config -Esshd.log -dddD


# in a separate window:
$ pwd
/tmp/todelete/openssh/regress
$ mv sshd sshd.o # simulate reexec fallback

$ ../ssh -nqo "Protocol=2" -F ssh_config somehost ls
Connection to 127.0.0.1 closed by remote host.
$ echo $?
255

I haven't been able to figure out exactly what's causing the failure, but it
appears to be something about assumption about which fd values.

I.e. if we ensure that FD to which stderr is redirected (-E), gets a higher
than usual fd, then the problem goes away:


--- log.c.orig	2013-08-02 15:27:27.000000000 -0700
+++ log.c	2013-08-02 15:28:23.000000000 -0700
@@ -352,11 +352,20 @@
 {
 	int fd;
 
+#define XSIZ 3
+	int f[XSIZ], i;
+	/* ensure fd gets a higher numbered fd */
+	for (i = 0; i < XSIZ; i++)
+		f[i] = open("/COPYRIGHT", O_RDONLY);
+
 	if ((fd = open(logfile, O_WRONLY|O_CREAT|O_APPEND, 0600)) == -1) {
 		fprintf(stderr, "Couldn't open logfile %s: %s\n", logfile,
 		     strerror(errno));
 		exit(1);
 	}
+	for (i = 0; i < XSIZ; i++)
+		close(f[i]);
+
 	log_stderr_fd = fd;
 }
Comment 1 Arthur Mesh 2013-08-03 12:42:48 AEST
I have some more details:

Apparently, startup_pipe gets the same fd value as connection_in/connection_out.

2094  authenticated:
2095         /*
2096          * Cancel the alarm we set to limit the time taken for
2097          * authentication.
2098          */
2099         alarm(0);
2100         signal(SIGALRM, SIG_DFL);
2101         authctxt->authenticated = 1;
2102         if (startup_pipe != -1) {
2103                 close(startup_pipe);
2104                 startup_pipe = -1;
2105         }

So by closing(startup_pipe) on line 2103, we also inadvertently close
connection_in/connection_out fd. Which causes the bug.

--- sshd.c.orig	2013-08-02 19:40:58.000000000 -0700
+++ sshd.c	2013-08-02 19:41:01.000000000 -0700
@@ -2100,7 +2100,7 @@
 	signal(SIGALRM, SIG_DFL);
 	authctxt->authenticated = 1;
 	if (startup_pipe != -1) {
-		close(startup_pipe);
+		//close(startup_pipe);
 		startup_pipe = -1;
 	}
 
This prevents the problem from happening, but likely leaks the fd.. I need to
futher look in to how startup_pipe is supposed to work and how to properly fix
it.

Thanks
Comment 2 Damien Miller 2013-09-19 10:37:58 AEST
I can't replicate this with -current on OpenBSD, but I think I can see how it happens.
Comment 3 Damien Miller 2013-09-19 10:40:22 AEST
Created attachment 2339 [details]
Reset startup_pipe after dup2'ing

Please try this diff.
Comment 4 Arthur Mesh 2013-09-20 05:09:01 AEST
Looks like this diff solves the symptom I've reported.

Thanks
Comment 5 Damien Miller 2013-10-10 11:54:28 AEDT
Hang on - the patch hasn't been committed yet. We'll close the bug once it is in.
Comment 6 Damien Miller 2013-10-10 12:07:09 AEDT
Created attachment 2347 [details]
Revised diff

Revised diff - this closes the original startup_pipe after checking it hasn't already landed at the fd number we want to inhabit.

Could you please test this one?
Comment 7 Arthur Mesh 2013-10-11 05:21:56 AEDT
Revised patch seems to work. I.e., my testcase is no longer reproducible.
Thanks
Comment 8 Damien Miller 2013-10-24 10:28:03 AEDT
Patch committed - this will be in OpenSSH-6.4. Thanks!
Comment 9 Damien Miller 2016-08-02 10:42:17 AEST
Close all resolved bugs after 7.3p1 release