Created attachment 2925 [details] Fix race conditions in forwarding tests As described in https://lists.mindrot.org/pipermail/openssh-unix-dev/2016-December/035629.html, I'm seeing peculiar failures in the forwarding tests on some systems. I suspect (though can't prove) that this may be the result of a race condition similar to that fixed in portable b3c19151cba2c0ed01b27f55de0d723ad07ca98f, so the attached patch converts the other tests in regress/forwarding.sh to use the same mux socket approach introduced in that commit. I can only test this by making an upload to Debian unstable, so I'll let you know if this improves matters ...
That seems to have indeed improved matters in my test runners. The remaining failure I see should be fixed by the patch in https://bugzilla.mindrot.org/show_bug.cgi?id=2660.
Applied, thanks.
This change is causing hangs in forwarding.sh in OpenBSD -current.
(In reply to Damien Miller from comment #3) > This change is causing hangs in forwarding.sh in OpenBSD -current. It didn't when I committed it and doesn't with binaries from Jan 24. I did see a problem when I just did a make && make install in the ssh dir, however those problems also went away when I did a clean build. Can you reproduce with a clean build? if not, maybe it's fallout from the libssl churn.
Yes, it fails with a clean build on OpenBSD and on Linux too. I'll see if I can bisect to see what broke it.
Fails on OpenBSD with usr.bin/ssh updated to 20170123, so the problem is likely elsewhere. It looks like the client is crashing: debug1: Local connections to LOCALHOST:3304 forwarded to remote address 127.0.0.1:4242 debug3: channel_setup_fwd_listener_tcpip: type 2 wildcard 0 addr NULL debug1: Local forwarding listening on 127.0.0.1 port 3304. debug2: fd 7 setting O_NONBLOCK debug3: fd 7 is O_NONBLOCK debug1: channel 2: new [port listener] debug1: Local forwarding listening on ::1 port 3304. debug2: fd 8 setting O_NONBLOCK debug3: fd 8 is O_NONBLOCK debug1: channel 3: new [port listener] Could not request local forwarding. FAIL: connection failed, should not
(In reply to Damien Miller from comment #6) > Could not request local forwarding. err, not crashing. That is a fatal()
Created attachment 2932 [details] slightly less broken This fixes the hangs. The test was opening a login session with stdout/err diverted. It still fails in the "exit on -L/-R forward failure" bits: env SUDO="" "MALLOC_OPTIONS=CFGJRSUX" sh /usr/src/regress/usr.bin/ssh/test-exec.sh /usr/src/regress/usr.bin/ssh/obj /usr/src/regress/usr.bin/ssh/forwarding.sh generate keys wait for sshd start forwarding, fork to background transfer over forwarded channels and check result exit on -L forward failure, proto 2 connection failed, should not exit on -R forward failure, proto 2 connection failed, should not simple clear forwarding proto 2 clear local forward proto 2 local forwarding not cleared clear remote forward proto 2 remote forwarding not cleared stdio forwarding proto 2 config file: start forwarding, fork to background config file: transfer over forwarded channels and check result transfer over chained unix domain socket forwards and check result wait for sshd to exit failed local and remote forwarding *** Error 1 in /usr/src/regress/usr.bin/ssh (Makefile:186 't-forwarding')
Figured it out, here's the essence of the fix: - ${SSH} -S $CTL -O exit somehost + ${SSH} -F $OBJ/ssh_config -S $CTL -O exit somehost The "transfer over forwarded channels and check result" section started a ssh mux master and tried to stop it at the end. Unfortunately, the $SSH invocation used to kill the mux master would pick up the ~/.ssh/config of the user running the test because it didn't specify an explicit configuration. This would result in the mux master persisting to the next block. Previously, this would work because of two coincidences: 1) the $CTL socket would be carried over to the next test block (ExitOnForwardFailure) and 2) the forwarding specification is the same in each block, so there was no collision between them. When this change added the "rm -f $CTL" to the ExitOnForwardFailure test block, this invalidated coincidence #1 and the forwarding attempts there would get EADDRINUSE because the ports were already busy (forwarded by the lingering mux master from the previous block). Anyway, I've committed the fix
closing resolved bugs as of 8.6p1 release