Bug 2135 - Solaris: race condition in channel forwarding when connect() returns EINPROGRESS
Summary: Solaris: race condition in channel forwarding when connect() returns EINPROGRESS
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 6.0p1
Hardware: All Solaris
: P3 normal
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks: V_6_3
  Show dependency treegraph
 
Reported: 2013-07-31 23:12 AEST by Ivo Raisr
Modified: 2016-08-02 10:42 AEST (History)
1 user (show)

See Also:


Attachments
the patch for portable (846 bytes, patch)
2013-07-31 23:12 AEST, Ivo Raisr
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ivo Raisr 2013-07-31 23:12:21 AEST
Created attachment 2322 [details]
the patch for portable

Consider the following scenario with channel forwarding:

- sshd runs on the server machine
- on the client machine, local port forwarding is setup via the following:
ssh -nfN -o GatewayPorts=yes -L 3333:<REMOTE HOST>:5555 -p 2222 <server machine>
- <REMOTE HOST> is a faraway machine which takes several tens or hundreds of milliseconds to respond
(slow line, geographical distance)
- nothing is accepting connections at <REMOTE HOST>:5555

Under normal circumstances, when <REMOTE HOST> is the same machine as <server machine>
or is near enough, connect() call in channels.c:connect_next() returns with
non-blocking socket which has already finalized connection attempt
(errno ECONNREFUSED or ETIMEDOUT, for example).

When connection attempt to <REMOTE HOST> takes a while,
connect() call returns with errno EINPROGRESS. In this case, outcome
of the connection attempt is not yet known. When TCP stack finaly decides
then connection attempt has failed, it places the error into pending socket error,
where it is later pulled up by select()/poll().

However OpenSSH does a bunch of other stuff between connect() and select().
In particular, it calls channels.c:channel_register_fds.
And the Portable OpenSSH (not "pure" OpenSSH) invokes isatty() here.
On Solaris, isatty() invokes underlaying ioctl() which
gets and clears the pending socket error.

Therefore on Solaris, there is a race condition when the following happens:
- OpenSSH invokes connect() which returns EINPROGRESS
- connection attempt outcome is known after a while and is placed into pending socket error
  by the TCP stack
- OpenSSH invokes isatty() which gets and clears this pending socket error
- OpenSSH invokes select() which blocks indefinitely because the pending socket error has been already cleared

Had the connection attempt taken a little bit longer and its outcome had been known
when OpenSSH was already performing select(), it would behave correctly.
However in this case, this particular timing is important.

=========================

I realized that isatty() call has been put there into Portable OpenSSH to initialize wfd_isatty attribute
of Channel structure. However this attribute is currently used only for AIX. So I would
say it is safe to declare and initialize it for AIX only as well.

With the attached patch, the problem is fixed.
The patch is against OpenSSH 6.0p1 but newer releases are affected as well.
Comment 1 Damien Miller 2013-08-01 14:29:27 AEST
This looks like it might have been "fun" to debug. You just made it for openssh-6.3 - thanks.
Comment 2 Damien Miller 2016-08-02 10:42:58 AEST
Close all resolved bugs after 7.3p1 release