Built 4.4p1 with Solaris contract support on Solaris 10 - both x86 and sparc Problem can be shown with: Create two ssh connections to a Solaris 10 machine running this code. In the first connection, do "tail -f /var/adm/messages" In the second connection, kill the 'tail' process started in the first. This not only kills the 'tail' process, it terminates the first connection as well - which is unexpected behaviour. The problem also manifests itself it a process being run in an ssh connection core-dumps - the ssh connection is terminated.
FWIW I can't reproduce here (5.10 Generic_118833-24 sun4u, sshd built --with-solaris-contracts only) in a fairly vanilla setup. Did you build with any options other than --with-solaris-contracts? Any other configuration information about your system that might be unusual/relevant? Do you have any non-default options set in sshd_config? How exactly did you kill the tail process? ("kill -TERM [pid]"?) Are you running sshd as a stand-alone daemon or via SMF? Oh, and are you talking about 2 separate ssh connections, or two sessions within one connection (ie ControlMaster and friends)?
OK, thanks for the info. It must be our builds then. Configure args are: ./configure --prefix=/opt/thus --bindir=/opt/thus/bin --sbindir=/opt/thus/sbin --libexecdir=/opt/thus/libexec/ssh --datadir=/opt/thus/share/ssh --sysconfdir=/etc/opt/THUSssh --sharedstatedir=/opt/thus/com/ssh --localstatedir=/var/opt/THUSssh --libdir=/opt/thus/lib --includedir=/opt/thus/include/ssh --oldincludedir=/opt/thus/include/ssh --infodir=/opt/thus/share/info --mandir=/opt/thus/share/man --disable-strip --with-tcp-wrappers --with-pid-dir=/var/run --with-ssl-dir=/usr/sfw --with-ssl-engine --with-pam --with-xauth=/usr/openwin/bin/xauth --with-audit=bsm --with-solaris-contracts Using Sun Studio 11 as the compiler. sshd_config is essentially standard. The test I gave is using 2 completely separate ssh connections, and a straight "kill <PID>" command. We're running sshd via SMF. The listening daemon's contract is: root@solaris-10-sparc:/# ctstat -vi 51 CTID ZONEID TYPE STATE HOLDER EVENTS QTIME NTIME 51 0 process owned 7 0 - - cookie: 0x20 informative event set: none critical event set: core signal hwerr empty fatal event set: none parameter set: inherit regent member processes: 411 inherited contracts: none A spawned user ssh process contract is: root@solaris-10-sparc:/# ctstat -vi 1866 CTID ZONEID TYPE STATE HOLDER EVENTS QTIME NTIME 1866 0 process orphan - 0 - - cookie: 0 informative event set: core signal critical event set: hwerr fatal event set: core signal hwerr parameter set: none member processes: 20871 20874 20880 inherited contracts: none The presence of 'signal' and 'core' in the fatal event set for the spawned client's contract is interesting - as is the fact that the user's shell is in the same contract as the spawned sshd: benhaman@solaris-10-sparc:~$ ps -f UID PID PPID C STIME TTY TIME CMD benhaman 5561 5549 0 11:35:30 pts/1 0:00 ps -f benhaman 5549 5547 0 11:35:27 pts/1 0:00 -bash benhaman@solaris-10-sparc:~$ ptree -c 5549 [process contract 1996] 5545 /opt/thus/sbin/sshd -u 0 -R 5547 /opt/thus/sbin/sshd -u 0 -R 5549 -bash 5562 ptree -c 5549 (There are two '/opt/thus/sbin/sshd -u 0 -R' processes because of privilege separation).
Changing the code in openbsd-compat/port-solaris.c to be like: http://cvs.opensolaris.org/source/xref/on/usr/src/cmd/ssh/sshd/sshd.c#365 fixes the problem for me.
Sorry, but we can't copy (or even look at) OpenSolaris code because they have relicensed their OpenSSH derivative and we do not want to contaminate ours. If you can describe *in words* what the difference is, then perhaps we can implement something equivalent.
OK, understood. What we've done in openbsd-compat/port-solaris.c is 1/. Added a ct_pr_tmpl_set_param() call to only kill the process group on fatal errors 2/. Added a ct_tmpl_set_informative() call to make HWERR events informative. 3/. Changed the ct_pr_tmpl_set_fatal() call to make only HWERR events fatal (i.e. SIGNAL and CORE events aren't). 4/. Changed the ct_tmpl_set_critical() call so that no events are critical (i.e. HWERR events aren't). These changes have solved our problem.
Created attachment 1204 [details] Implement changes described in comment #5. I have replicated the problem you describe and implemented your suggested changes which resolve the problem for me. Also, the output of ctstat -vi now looks the same as one for the native sshd. Does this patch also work for you? Thanks.
Your patch works for me too.
Patch applied, thanks.
Close resolved bugs after release.