Bug 1255 - Solaris contract support kills processes when receiving signals
Summary: Solaris contract support kills processes when receiving signals
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 4.4p1
Hardware: All Solaris
: P2 normal
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks: V_4_5
  Show dependency treegraph
 
Reported: 2006-10-27 19:34 AEST by Andrew Benham
Modified: 2008-04-04 09:56 AEDT (History)
0 users

See Also:


Attachments
Implement changes described in comment #5. (1.59 KB, patch)
2006-10-31 12:39 AEDT, Darren Tucker
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Benham 2006-10-27 19:34:26 AEST
Built 4.4p1 with Solaris contract support on Solaris 10
- both x86 and sparc

Problem can be shown with:

Create two ssh connections to a Solaris 10 machine running this code.

In the first connection, do "tail -f /var/adm/messages"

In the second connection, kill the 'tail' process started in the first.

This not only kills the 'tail' process, it terminates the first connection as well - which is unexpected behaviour.


The problem also manifests itself it a process being run in an ssh connection core-dumps - the ssh connection is terminated.
Comment 1 Darren Tucker 2006-10-27 21:01:42 AEST
FWIW I can't reproduce here (5.10 Generic_118833-24 sun4u, sshd built --with-solaris-contracts only) in a fairly vanilla setup.

Did you build with any options other than --with-solaris-contracts?  Any other configuration information about your system that might be unusual/relevant?  Do you have any non-default options set in sshd_config?

How exactly did you kill the tail process?  ("kill -TERM [pid]"?)  Are you running sshd as a stand-alone daemon or via SMF?

Oh, and are you talking about 2 separate ssh connections, or two sessions within one connection (ie ControlMaster and friends)?
Comment 2 Andrew Benham 2006-10-27 21:38:08 AEST
OK, thanks for the info.  It must be our builds then.

Configure args are:
./configure --prefix=/opt/thus --bindir=/opt/thus/bin --sbindir=/opt/thus/sbin --libexecdir=/opt/thus/libexec/ssh --datadir=/opt/thus/share/ssh --sysconfdir=/etc/opt/THUSssh --sharedstatedir=/opt/thus/com/ssh --localstatedir=/var/opt/THUSssh --libdir=/opt/thus/lib --includedir=/opt/thus/include/ssh --oldincludedir=/opt/thus/include/ssh --infodir=/opt/thus/share/info --mandir=/opt/thus/share/man --disable-strip --with-tcp-wrappers --with-pid-dir=/var/run --with-ssl-dir=/usr/sfw --with-ssl-engine --with-pam --with-xauth=/usr/openwin/bin/xauth --with-audit=bsm --with-solaris-contracts

Using Sun Studio 11 as the compiler.

sshd_config is essentially standard.

The test I gave is using 2 completely separate ssh connections, and a straight "kill <PID>" command.

We're running sshd via SMF.

The listening daemon's contract is:
root@solaris-10-sparc:/# ctstat -vi 51
CTID    ZONEID  TYPE    STATE   HOLDER  EVENTS  QTIME   NTIME
51      0       process owned   7       0       -       -
         cookie:                0x20
         informative event set: none
         critical event set:    core signal hwerr empty
         fatal event set:       none
         parameter set:         inherit regent
         member processes:      411
         inherited contracts:   none

A spawned user ssh process contract is:
root@solaris-10-sparc:/# ctstat -vi 1866
CTID    ZONEID  TYPE    STATE   HOLDER  EVENTS  QTIME   NTIME
1866    0       process orphan  -       0       -       -
         cookie:                0
         informative event set: core signal
         critical event set:    hwerr
         fatal event set:       core signal hwerr
         parameter set:         none
         member processes:      20871 20874 20880
         inherited contracts:   none

The presence of 'signal' and 'core' in the fatal event set for the spawned client's contract is interesting - as is the fact that the
user's shell is in the same contract as the spawned sshd:

benhaman@solaris-10-sparc:~$ ps -f
     UID   PID  PPID   C    STIME TTY         TIME CMD
benhaman  5561  5549   0 11:35:30 pts/1       0:00 ps -f
benhaman  5549  5547   0 11:35:27 pts/1       0:00 -bash
benhaman@solaris-10-sparc:~$ ptree -c 5549
[process contract 1996]
  5545  /opt/thus/sbin/sshd -u 0 -R
    5547  /opt/thus/sbin/sshd -u 0 -R
      5549  -bash
        5562  ptree -c 5549

(There are two '/opt/thus/sbin/sshd -u 0 -R' processes because of privilege separation).
Comment 3 Andrew Benham 2006-10-28 02:09:19 AEST
Changing the code in openbsd-compat/port-solaris.c to be like:

http://cvs.opensolaris.org/source/xref/on/usr/src/cmd/ssh/sshd/sshd.c#365

fixes the problem for me.
Comment 4 Damien Miller 2006-10-28 03:08:10 AEST
Sorry, but we can't copy (or even look at) OpenSolaris code because they have relicensed their OpenSSH derivative and we do not want to contaminate ours. If you can describe *in words* what the difference is, then perhaps we can implement something equivalent.
Comment 5 Andrew Benham 2006-10-31 00:31:22 AEDT
OK, understood.

What we've done in openbsd-compat/port-solaris.c is

1/. Added a ct_pr_tmpl_set_param() call to only kill the process group on fatal errors

2/. Added a ct_tmpl_set_informative() call to make HWERR events informative.

3/. Changed the ct_pr_tmpl_set_fatal() call to make only HWERR events fatal (i.e. SIGNAL and CORE events aren't).

4/. Changed the ct_tmpl_set_critical() call so that no events are critical (i.e. HWERR events aren't).

These changes have solved our problem.
Comment 6 Darren Tucker 2006-10-31 12:39:11 AEDT
Created attachment 1204 [details]
Implement changes described in comment #5.

I have replicated the problem you describe and implemented your suggested changes which resolve the problem for me.  Also, the output of ctstat -vi now looks the same as one for the native sshd.

Does this patch also work for you?  Thanks.
Comment 7 Andrew Benham 2006-11-01 04:37:31 AEDT
Your patch works for me too.
Comment 8 Darren Tucker 2006-11-01 10:30:13 AEDT
Patch applied, thanks.
Comment 9 Damien Miller 2008-04-04 09:56:59 AEDT
Close resolved bugs after release.