Bug 2945 - sshd unstable on Illumos (and probably Solaris) when compiled in XPG4 mode
Summary: sshd unstable on Illumos (and probably Solaris) when compiled in XPG4 mode
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 7.9p1
Hardware: amd64 Solaris
: P5 major
Assignee: Damien Miller
URL:
Keywords:
Depends on:
Blocks: V_8_1
  Show dependency treegraph
 
Reported: 2018-12-19 10:06 AEDT by Fazal Majid
Modified: 2021-04-23 15:04 AEST (History)
4 users (show)

See Also:


Attachments
Patch to check if the STREAMS modules are already pushed (968 bytes, patch)
2018-12-19 10:06 AEDT, Fazal Majid
no flags Details | Diff
revised diff (596 bytes, patch)
2019-03-01 14:03 AEDT, Damien Miller
dtucker: ok+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Fazal Majid 2018-12-19 10:06:34 AEDT
Created attachment 3217 [details]
Patch to check if the STREAMS modules are already pushed

My build of OpenSSH 7.9p1 on OpenIndiana oi151a9 and on SmartOS joyent_20170302T081240Z was hanging after a few minutes of use.

After further investigation, the root cause was this issue in Illumos:
	https://www.illumos.org/issues/9042
(not sure if it is also present in Oracle’s version of Solaris, probably so because the change dates to 2008)
Duplicate STREAMS terminal and line discipline modules were causing the instability.

Dump of STREAMS modules in Solaris SSH:

ecthelion ~>m -p 22
mordac ~>strconf
ttcompat
ldterm
ptem
pts

In the affected OpenSSH 7.9p1

ecthelion ~>m -p 2223
Last login: Tue Dec 18 20:49:04 2018 from 38.99.40.226
mordac ~>strconf
ttcompat
ldterm
ptem
ttcompat
ldterm
ptem
ttcompat
ldterm
ptem
pts

The root cause is that if the binary is compiled in XPG4 compliant mode (which is pretty much the default with reasonably modern versions of GCC), libc pushes the module
http://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libc/port/sys/open.c#163

but then again so does OpenSSH in bed_openpty.c, and we end up with triplicate modules pushed, with unpredictable results that usually manifest themselves as the connection hanging after a while:

Rather than checking if _XPG4 is defined to decide whether to push the modules or not, the fix I adopted in the attached patch is to check if the modules are already pushed before pushing them in openbsd-compat/bsd-openpty.c. I am being conservative and made the change Solaris-only as I have no idea if HP-UX supports the I_FIND ioctl or not.
Comment 1 Darren Tucker 2018-12-19 12:03:24 AEDT
If we need an ifdef I think we'd be better off checking for the presence of the thing needed (ie "#ifdef I_FIND").  It's also be nice to have a tidier way of doing this, although I'm not sure there is one...
Comment 2 andy 2018-12-21 00:06:41 AEDT
This issue is fixed in recent illumos and Solaris so that it is not possible to push these modules multiple times. However, adding a check is still a reasonable thing to do in order to support unpatched OS versions since XPG4v2 effectively requires that the OS pushes the modules on opening a slave pseudo terminal device.
Comment 3 Roumen Petrov 2018-12-21 02:28:23 AEDT
First question is why is so important to build according more then 20 (30?) yeas old specification like XPG4?

Using more modern XPG6 (version 2004 - https://pubs.opengroup.org/onlinepubs/009695399/functions/posix_openpt.html) there is requirements for stream modules.

I did some tests using implementation from https://roumenpetrov.info.example.net/secsh/  on Solaris 11.3 and the test shows that stream modules are not inserted:
=====
debug1: Allocating pty.

..TRACE openpty()  ...
..TRACE openpty() I_FIND ptem    : 0
..TRACE openpty() I_FIND ldterm  : 0
..TRACE openpty() I_FIND ttcompat: 0
debug1: session_pty_req: session 0 alloc /dev/pts/5
=====
So there is no need for patched OS!


Just stop to build for XPG4.

Why such build succeeded is another question.

Regards,
Roumen Petrov
Comment 4 andy 2018-12-21 03:03:48 AEDT
This applies in Solaris to XPG4 and above, so XPG6 too. If the modules are not being inserted, then the test code might have been built without XPG4 or 6 - it depends on the how the compiler you are using is configured but did you compile with std=C90 or above?

Look at `gcc -dumpspecs` for the startfile_arch section and if it includes values-xpg4.o or values-xpg6.o then you are compiling to that standard with the right flags, otherwise not.

This is from Solaris 11.3, gcc 7.3

*startfile_arch:
%{ansi:values-Xc.o%s; :values-Xa.o%s}      %{std=c99|std=c9x|std=gnu99|std=gnu9x:values-xpg6.o%s}      %{std=c11|std=c1x|std=gnu11|std=gnu1x:values-xpg6.o%s}      %{std=c++11|std=c++0x|std=gnu++11|std=gnu++0x:values-xpg6.o%s}      %{std=c++14|std=c++1y|std=gnu++14|std=gnu++1y:values-xpg6.o%s}      %{std=c++1z|std=gnu++1z:values-xpg6.o%s}

so to get an xpg6 build, you need at least -std=c99.
Comment 5 Damien Miller 2019-03-01 14:03:30 AEDT
Created attachment 3248 [details]
revised diff

This is a simplified patch that avoids making this file more of a #ifdef hell and adopts Darren's suggestion of testing for I_FIND directly.

I'm lacking the ability to test this, so I'd appreciate if someone else could.
Comment 6 Damien Miller 2019-04-03 10:06:37 AEDT
Nobody stepped up to test this, so this won't make the openssh-8.0 release. It won't make the 8.1 release either unless someone does.
Comment 7 Fazal Majid 2019-04-03 11:01:19 AEDT
Sorry, Damien, as the original reporter I should have tested your patch. I will do so on Oracle Solaris 11.4 in addition to an Illumos-based distro just to be on the safe side.
Comment 8 Fazal Majid 2019-04-03 11:06:44 AEDT
In addition, I am pursuing some really strange behavior with OpenSSH 7.9p1 in OpenIndiana oi151a9 (a really old version) where connecting to OpenSSH, Control-C in a shell causes the connection to freeze, as does Control-G in Emacs, when SunSSH does not exhibit this behavior. I suspect some interaction with the line discipline used by ksh93, tcsh and bash and sshd.
Comment 9 Fazal Majid 2019-04-05 10:28:00 AEDT
OK, I verified Damien's patch on SmartOS joyent_20170302T081240Z.

I'm having some difficulties building the dependencies on Oracle Solaris 11.4, so please bear with me.
Comment 10 Darren Tucker 2019-04-05 19:35:52 AEDT
(In reply to Fazal Majid from comment #8)
> In addition, I am pursuing some really strange behavior with OpenSSH
> 7.9p1 in OpenIndiana oi151a9 (a really old version) where connecting
> to OpenSSH, Control-C in a shell causes the connection to freeze

That sounds a bit like the STREAMS pty master bug (see bugzilla #245) wherein signals get delivered to the sshd not the user's shell.  If that's it, you can work around it by defining SSHD_ACQUIRES_CTTY.
Comment 11 Fazal Majid 2019-04-05 21:18:46 AEDT
Thanks Darren, but that doesn't seem to make a difference.

Here is output of ssh -v -v -v on the client just after issuing the Ctrl-C:

mordac ~>echo debug3: receive packet: type 96
debug2: channel 0: rcvd eof
debug2: channel 0: output open -> drain
debug2: channel 0: obuf empty
debug2: channel 0: close_write
debug2: channel 0: output drain -> closed

and on the server side with sshd -d -d -d:

debug2: channel 0: read<=0 rfd 11 len 0
debug2: channel 0: read failed
debug2: channel 0: chan_shutdown_read (i0 o0 sock -1 wfd 11 efd -1 [closed])
debug2: channel 0: input open -> drain
debug2: channel 0: ibuf empty
debug2: channel 0: send eof
debug3: send packet: type 96
debug2: channel 0: input drain -> closed

so it looks like sshd read 0 bytes and decided to send an EOF to the client. The connection was still open, as was the child process of sshd until I close it with a ~.
Comment 12 Fazal Majid 2019-04-05 21:27:42 AEDT
It's also worth noting that programs that should put themselves in raw mode like tmux or screen are also killed by a Ctrl-C, so something is off with line disciplines, even though the STREAMS module order seems correct.
Comment 13 Darren Tucker 2019-04-05 21:34:45 AEDT
(In reply to Fazal Majid from comment #11)
[...] 
> so it looks like sshd read 0 bytes and decided to send an EOF to the
> client.

Ah that's a different problem with a different workaround: try defining PTY_ZEROREAD.
Comment 14 Fazal Majid 2019-04-05 21:44:28 AEDT
The Control-C problem only occurs with bash and tcsh, not /bin/sh, so it's not an OpenSSH issue per se (I will try your PTY_ZEROREAD workaround).

Going back to testing Damien's patch on Oracle Solaris 11.4, sorry for the diversion.
Comment 15 Fazal Majid 2019-04-05 21:48:04 AEDT
PTY_ZEROREAD does stop the hanging, but screen and tmux remain in semi-raw line discipline where Control-C kills them with bash and tcsh.
Comment 16 Fazal Majid 2019-04-06 08:15:09 AEDT
Successfully tested on Oracle Solaris 11.4 (without SSHD_ACQUIRES_CTTY or PTY_ZEROREAD).

No duplicate STREAMS modules, and Control-C works properly in tcsh.
Comment 17 Darren Tucker 2019-04-26 09:42:45 AEST
Comment on attachment 3248 [details]
revised diff

https://bugzilla.mindrot.org/show_bug.cgi?id=2998#c24 also confirms this seems to work.
Comment 18 Darren Tucker 2019-04-26 20:57:15 AEST
I've committed the patch and it'll be in the next release. Thanks.
Comment 19 Damien Miller 2021-04-23 15:04:38 AEST
closing resolved bugs as of 8.6p1 release