| Summary: | Solaris contract support kills processes when receiving signals | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Portable OpenSSH | Reporter: | Andrew Benham <andrew.benham> | ||||
| Component: | sshd | Assignee: | Assigned to nobody <unassigned-bugs> | ||||
| Status: | CLOSED FIXED | ||||||
| Severity: | normal | ||||||
| Priority: | P2 | ||||||
| Version: | 4.4p1 | ||||||
| Hardware: | All | ||||||
| OS: | Solaris | ||||||
| Bug Depends on: | |||||||
| Bug Blocks: | 1222 | ||||||
| Attachments: |
|
||||||
|
Description
Andrew Benham
2006-10-27 19:34:26 AEST
FWIW I can't reproduce here (5.10 Generic_118833-24 sun4u, sshd built --with-solaris-contracts only) in a fairly vanilla setup.
Did you build with any options other than --with-solaris-contracts? Any other configuration information about your system that might be unusual/relevant? Do you have any non-default options set in sshd_config?
How exactly did you kill the tail process? ("kill -TERM [pid]"?) Are you running sshd as a stand-alone daemon or via SMF?
Oh, and are you talking about 2 separate ssh connections, or two sessions within one connection (ie ControlMaster and friends)?
OK, thanks for the info. It must be our builds then.
Configure args are:
./configure --prefix=/opt/thus --bindir=/opt/thus/bin --sbindir=/opt/thus/sbin --libexecdir=/opt/thus/libexec/ssh --datadir=/opt/thus/share/ssh --sysconfdir=/etc/opt/THUSssh --sharedstatedir=/opt/thus/com/ssh --localstatedir=/var/opt/THUSssh --libdir=/opt/thus/lib --includedir=/opt/thus/include/ssh --oldincludedir=/opt/thus/include/ssh --infodir=/opt/thus/share/info --mandir=/opt/thus/share/man --disable-strip --with-tcp-wrappers --with-pid-dir=/var/run --with-ssl-dir=/usr/sfw --with-ssl-engine --with-pam --with-xauth=/usr/openwin/bin/xauth --with-audit=bsm --with-solaris-contracts
Using Sun Studio 11 as the compiler.
sshd_config is essentially standard.
The test I gave is using 2 completely separate ssh connections, and a straight "kill <PID>" command.
We're running sshd via SMF.
The listening daemon's contract is:
root@solaris-10-sparc:/# ctstat -vi 51
CTID ZONEID TYPE STATE HOLDER EVENTS QTIME NTIME
51 0 process owned 7 0 - -
cookie: 0x20
informative event set: none
critical event set: core signal hwerr empty
fatal event set: none
parameter set: inherit regent
member processes: 411
inherited contracts: none
A spawned user ssh process contract is:
root@solaris-10-sparc:/# ctstat -vi 1866
CTID ZONEID TYPE STATE HOLDER EVENTS QTIME NTIME
1866 0 process orphan - 0 - -
cookie: 0
informative event set: core signal
critical event set: hwerr
fatal event set: core signal hwerr
parameter set: none
member processes: 20871 20874 20880
inherited contracts: none
The presence of 'signal' and 'core' in the fatal event set for the spawned client's contract is interesting - as is the fact that the
user's shell is in the same contract as the spawned sshd:
benhaman@solaris-10-sparc:~$ ps -f
UID PID PPID C STIME TTY TIME CMD
benhaman 5561 5549 0 11:35:30 pts/1 0:00 ps -f
benhaman 5549 5547 0 11:35:27 pts/1 0:00 -bash
benhaman@solaris-10-sparc:~$ ptree -c 5549
[process contract 1996]
5545 /opt/thus/sbin/sshd -u 0 -R
5547 /opt/thus/sbin/sshd -u 0 -R
5549 -bash
5562 ptree -c 5549
(There are two '/opt/thus/sbin/sshd -u 0 -R' processes because of privilege separation).
Changing the code in openbsd-compat/port-solaris.c to be like: http://cvs.opensolaris.org/source/xref/on/usr/src/cmd/ssh/sshd/sshd.c#365 fixes the problem for me. Sorry, but we can't copy (or even look at) OpenSolaris code because they have relicensed their OpenSSH derivative and we do not want to contaminate ours. If you can describe *in words* what the difference is, then perhaps we can implement something equivalent. OK, understood. What we've done in openbsd-compat/port-solaris.c is 1/. Added a ct_pr_tmpl_set_param() call to only kill the process group on fatal errors 2/. Added a ct_tmpl_set_informative() call to make HWERR events informative. 3/. Changed the ct_pr_tmpl_set_fatal() call to make only HWERR events fatal (i.e. SIGNAL and CORE events aren't). 4/. Changed the ct_tmpl_set_critical() call so that no events are critical (i.e. HWERR events aren't). These changes have solved our problem. Created attachment 1204 [details] Implement changes described in comment #5. I have replicated the problem you describe and implemented your suggested changes which resolve the problem for me. Also, the output of ctstat -vi now looks the same as one for the native sshd. Does this patch also work for you? Thanks. Your patch works for me too. Patch applied, thanks. Close resolved bugs after release. |