Bug 2659 - Fix race conditions in forwarding tests
Summary: Fix race conditions in forwarding tests
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: Regression tests (show other bugs)
Version: 7.4p1
Hardware: Other Linux
: P5 enhancement
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks: V_7_5
  Show dependency treegraph
 
Reported: 2017-01-03 02:32 AEDT by Colin Watson
Modified: 2021-04-23 14:56 AEST (History)
2 users (show)

See Also:


Attachments
Fix race conditions in forwarding tests (4.21 KB, patch)
2017-01-03 02:32 AEDT, Colin Watson
no flags Details | Diff
slightly less broken (979 bytes, patch)
2017-01-30 15:24 AEDT, Damien Miller
dtucker: ok+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Colin Watson 2017-01-03 02:32:57 AEDT
Created attachment 2925 [details]
Fix race conditions in forwarding tests

As described in https://lists.mindrot.org/pipermail/openssh-unix-dev/2016-December/035629.html, I'm seeing peculiar failures in the forwarding tests on some systems.  I suspect (though can't prove) that this may be the result of a race condition similar to that fixed in portable b3c19151cba2c0ed01b27f55de0d723ad07ca98f, so the attached patch converts the other tests in regress/forwarding.sh to use the same mux socket approach introduced in that commit.

I can only test this by making an upload to Debian unstable, so I'll let you know if this improves matters ...
Comment 1 Colin Watson 2017-01-03 23:20:09 AEDT
That seems to have indeed improved matters in my test runners.  The remaining failure I see should be fixed by the patch in https://bugzilla.mindrot.org/show_bug.cgi?id=2660.
Comment 2 Darren Tucker 2017-01-06 13:10:40 AEDT
Applied, thanks.
Comment 3 Damien Miller 2017-01-30 11:22:33 AEDT
This change is causing hangs in forwarding.sh in OpenBSD -current.
Comment 4 Darren Tucker 2017-01-30 12:44:37 AEDT
(In reply to Damien Miller from comment #3)
> This change is causing hangs in forwarding.sh in OpenBSD -current.

It didn't when I committed it and doesn't with binaries from Jan 24.  I did see a problem when I just did a make && make install in the ssh dir, however those problems also went away when I did a clean build.

Can you reproduce with a clean build?  if not, maybe it's fallout from the libssl churn.
Comment 5 Damien Miller 2017-01-30 14:17:30 AEDT
Yes, it fails with a clean build on OpenBSD and on Linux too. I'll see if I can bisect to see what broke it.
Comment 6 Damien Miller 2017-01-30 14:29:46 AEDT
Fails on OpenBSD with usr.bin/ssh updated to 20170123, so the problem is likely elsewhere. 

It looks like the client is crashing:

debug1: Local connections to LOCALHOST:3304 forwarded to remote address 127.0.0.1:4242
debug3: channel_setup_fwd_listener_tcpip: type 2 wildcard 0 addr NULL
debug1: Local forwarding listening on 127.0.0.1 port 3304.
debug2: fd 7 setting O_NONBLOCK
debug3: fd 7 is O_NONBLOCK
debug1: channel 2: new [port listener]
debug1: Local forwarding listening on ::1 port 3304.
debug2: fd 8 setting O_NONBLOCK
debug3: fd 8 is O_NONBLOCK
debug1: channel 3: new [port listener]
Could not request local forwarding.
FAIL: connection failed, should not
Comment 7 Damien Miller 2017-01-30 14:41:45 AEDT
(In reply to Damien Miller from comment #6)
> Could not request local forwarding.

err, not crashing. That is a fatal()
Comment 8 Damien Miller 2017-01-30 15:24:29 AEDT
Created attachment 2932 [details]
slightly less broken

This fixes the hangs. The test was opening a login session with stdout/err diverted.

It still fails in the "exit on -L/-R forward failure" bits:

env SUDO="" "MALLOC_OPTIONS=CFGJRSUX"  sh /usr/src/regress/usr.bin/ssh/test-exec.sh /usr/src/regress/usr.bin/ssh/obj /usr/src/regress/usr.bin/ssh/forwarding.sh
generate keys
wait for sshd
start forwarding, fork to background
transfer over forwarded channels and check result
exit on -L forward failure, proto 2
connection failed, should not
exit on -R forward failure, proto 2
connection failed, should not
simple clear forwarding proto 2
clear local forward proto 2
local forwarding not cleared
clear remote forward proto 2
remote forwarding not cleared
stdio forwarding proto 2
config file: start forwarding, fork to background
config file: transfer over forwarded channels and check result
transfer over chained unix domain socket forwards and check result
wait for sshd to exit
failed local and remote forwarding
*** Error 1 in /usr/src/regress/usr.bin/ssh (Makefile:186 't-forwarding')
Comment 9 Damien Miller 2017-01-30 18:47:36 AEDT
Figured it out, here's the essence of the fix:

- ${SSH} -S $CTL -O exit somehost
+ ${SSH} -F $OBJ/ssh_config -S $CTL -O exit somehost

The "transfer over forwarded channels and check result" section started a ssh mux master and tried to stop it at the end. Unfortunately, the $SSH invocation used to kill the mux master would pick up the ~/.ssh/config of the user running the test because it didn't specify an explicit configuration. This would result in the mux master persisting to the next block.

Previously, this would work because of two coincidences: 1) the $CTL socket would be carried over to the next test block (ExitOnForwardFailure) and 2) the forwarding specification is the same in each block, so there was no collision between them.

When this change added the "rm -f $CTL" to the ExitOnForwardFailure test block, this invalidated coincidence #1 and the forwarding attempts there would get EADDRINUSE because the ports were already busy (forwarded by the lingering mux master from the previous block).

Anyway, I've committed the fix
Comment 10 Damien Miller 2021-04-23 14:56:30 AEST
closing resolved bugs as of 8.6p1 release