Bug 3512 - net-misc/openssh-9.1_p1: stopped accepting connections after upgrade to sys-libs/glibc-2.36 (fatal: ssh_sandbox_violation: unexpected system call)
Summary: net-misc/openssh-9.1_p1: stopped accepting connections after upgrade to sys-l...
Status: NEW
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 9.1p1
Hardware: amd64 Linux
: P5 major
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks: V_9_4
  Show dependency treegraph
 
Reported: 2022-12-20 22:16 AEDT by jussi
Modified: 2023-03-17 13:33 AEDT (History)
4 users (show)

See Also:


Attachments
Allow writev in seccomp sandbox (445 bytes, patch)
2022-12-21 08:27 AEDT, Darren Tucker
djm: ok+
Details | Diff
strace logs (58.90 KB, text/plain)
2022-12-21 17:21 AEDT, jussi
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description jussi 2022-12-20 22:16:43 AEDT
After updating to sys-libs/glibc-2.36-r5 sshd stopped accepting connections:

debug3: fd 6 is not O_NONBLOCK
debug1: Forked child 5799.
debug3: send_rexec_state: entering fd = 9 config len 3506
debug3: ssh_msg_send: type 0
debug3: send_rexec_state: done
debug3: oom_adjust_restore
debug1: Set /proc/self/oom_score_adj to 0
debug1: rexec start in 6 out 6 newsock 6 pipe 8 sock 9
debug1: inetd sockets after dupping: 4, 4
debug1: Local version string SSH-2.0-OpenSSH_9.1
debug1: Remote protocol version 2.0, remote software version OpenSSH_8.6
debug1: compat_banner: match: OpenSSH_8.6 pat OpenSSH* compat 0x04000000
debug2: fd 4 setting O_NONBLOCK
debug3: ssh_sandbox_init: preparing seccomp filter sandbox
debug2: Network child is on pid 5800
debug3: preauth child monitor started
debug3: privsep user:group 22:22 [preauth]
debug1: permanently_set_uid: 22/22 [preauth]
debug3: ssh_sandbox_child_debugging: installing SIGSYS handler [preauth]
debug3: ssh_sandbox_child: setting PR_SET_NO_NEW_PRIVS [preauth]
debug3: ssh_sandbox_child: attaching seccomp filter program [preauth]
debug3: append_hostkey_type: ssh-rsa key not permitted by HostkeyAlgorithms [preauth]
debug1: list_hostkey_types: rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519 [preauth]
fatal: ssh_sandbox_violation: unexpected system call (arch:0xc000003e,syscall:20 @ 0x7fa1f0041638) [preauth]
debug3: mm_request_receive: entering
debug1: do_cleanup
debug3: PAM: sshpam_thread_cleanup entering
debug1: Killing privsep child 5800
Comment 1 Darren Tucker 2022-12-20 22:30:27 AEDT
1) what linux distro are you talking about?
2) sounds like the glibc getentropy we thing probably fixed as described in bug#3487.
Comment 2 jussi 2022-12-20 22:33:05 AEDT
Gentoo, https://bugs.gentoo.org/887405
Comment 3 Darren Tucker 2022-12-20 22:54:10 AEDT
(In reply to jussi from comment #2)
> Gentoo, https://bugs.gentoo.org/887405

Please try adding this patch:
https://github.com/openssh/openssh-portable/commit/da6038bd5cd55eb212eb2aec1fc8ae79bbf76156
Comment 4 jussi 2022-12-21 04:14:16 AEDT
Still the same thing after applying the patch.
Comment 5 Darren Tucker 2022-12-21 08:27:59 AEDT
Created attachment 3645 [details]
Allow writev in seccomp sandbox

I'm curious about what it might be doing.  Could you please strace sshd and attach it to this patch (it'll be noisy so please use "add attachment" rather than pasting inline).  Something like:

$ sudo strace -f /path/to/sshd -De -p 2022

and connect to it with "ssh -p 2022 localhost".

Once that's done, please try the attached patch (no need to strace that one).
Comment 6 jussi 2022-12-21 17:21:55 AEDT
Created attachment 3646 [details]
strace logs

After applying the patch:

debug3: oom_adjust_restore
debug1: Set /proc/self/oom_score_adj to 0
debug1: rexec start in 5 out 5 newsock 5 pipe 7 sock 8
debug1: inetd sockets after dupping: 4, 4
debug1: Local version string SSH-2.0-OpenSSH_9.1
debug1: Remote protocol version 2.0, remote software version OpenSSH_8.6
debug1: compat_banner: match: OpenSSH_8.6 pat OpenSSH* compat 0x04000000
debug2: fd 4 setting O_NONBLOCK
debug3: ssh_sandbox_init: preparing seccomp filter sandbox
debug2: Network child is on pid 1425
debug3: preauth child monitor started
debug3: privsep user:group 22:22 [preauth]
debug1: permanently_set_uid: 22/22 [preauth]
debug3: ssh_sandbox_child_debugging: installing SIGSYS handler [preauth]
debug3: ssh_sandbox_child: setting PR_SET_NO_NEW_PRIVS [preauth]
debug3: ssh_sandbox_child: attaching seccomp filter program [preauth]
debug3: append_hostkey_type: ssh-rsa key not permitted by HostkeyAlgorithms [preauth]
debug1: list_hostkey_types: rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519 [preauth]
fatal: ssh_sandbox_violation: unexpected system call (arch:0xc000003e,syscall:234 @ 0x7f510727228c) [preauth]
debug3: mm_request_receive: entering
debug1: do_cleanup
debug3: PAM: sshpam_thread_cleanup entering
debug1: Killing privsep child 1425
Comment 7 Damien Miller 2022-12-21 22:30:18 AEDT
/usr/include/x86_64-linux-gnu/asm/unistd_64.h:#define __NR_tgkill 234

idk what is calling tgkill(2) - sshd isn't threaded
Comment 8 Sam James 2023-01-03 15:14:21 AEDT
I assume it is, but to check, given you're running a pretty old kernel and I don't have every single quirk memorised..

1. What does `grep -rsin "#define.*234" /usr/include/asm` return?
2. Could you possibly try to run the ssh client under gdb until it dies, then get a backtrace?
Comment 9 Darren Tucker 2023-01-03 15:28:29 AEDT
(In reply to Damien Miller from comment #7)
> idk what is calling tgkill(2) - sshd isn't threaded

Maybe OpenSSL?  You could test this theory by configuring OpenSSH --without-openssl and seeing if the problem persists.
Comment 10 Darren Tucker 2023-01-03 15:45:04 AEDT
(In reply to Sam James from comment #8)
> I assume it is, but to check, given you're running a pretty old
> kernel and I don't have every single quirk memorised..
> 
> 1. What does `grep -rsin "#define.*234" /usr/include/asm` return?

tgkill on amd64 as per comment#7

> 2. Could you possibly try to run the ssh client under gdb until it
> dies, then get a backtrace?

That should work, but since the violation happens in a subprocess of the main sshd you'll need to set follow-fork-mode to "child".  After removing write from the sandbox allowlist:

$ sudo gdb -q --args `pwd`/sshd -ddd -p 2222
Reading symbols from /home/dtucker/openssh/upstream/openssh/build/linux/sshd...
(gdb) set follow-fork child
(gdb) break ssh_sandbox_violation
Breakpoint 1 at 0xb834: file ../../sandbox-seccomp-filter.c, line 378.
(gdb) run
[... debug output elided ...]

Thread 2.1 "sshd" received signal SIGSYS, Bad system call.
[Switching to Thread 0x7ffff7f451c0 (LWP 1394237)]
0x00007ffff7701977 in write () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7701977 in write () from /lib64/libc.so.6
#1  0x00005555555dcecf in atomicio6 (f=0x7ffff7701960 <write>, fd=7, [... backtrace elided ...]
Comment 11 Sam James 2023-01-03 15:47:27 AEDT
(In reply to Darren Tucker from comment #10)
> tgkill on amd64 as per comment#7

(sorry, I was asking OP in the unlikely event that their headers are something odd.)
Comment 12 alarig 2023-01-13 22:56:53 AEDT
(In reply to Darren Tucker from comment #9)
> (In reply to Damien Miller from comment #7)
> > idk what is calling tgkill(2) - sshd isn't threaded
> 
> Maybe OpenSSL?  You could test this theory by configuring OpenSSH
> --without-openssl and seeing if the problem persists.

Hello,

I ran into the same issue (also running Gentoo on an old kernel).
I disabled the ssl flag and applied the patch, but I still have the unexpected system call)

I also tried to downgrade dev-libs/libgcrypt to 1.9.4-r2 and net-misc/openssh to 9.0 and 8.9.
libgcrypt didn’t change anything
Older openssh don’t print anything on the server side, but the connection is refused with the same errors on the client side.

I don’t know if it’s relevent, but on the configure output I have
checking for getentropy... yes
Which questions me, as I have a 3.16 kernel, which doesn’t support it.
Comment 13 Darren Tucker 2023-01-13 23:46:36 AEDT
(In reply to alarig from comment #12)
[...]
> I ran into the same issue (also running Gentoo on an old kernel).
> I disabled the ssl flag and applied the patch, but I still have the
> unexpected system call)
> 
> I also tried to downgrade dev-libs/libgcrypt to 1.9.4-r2 and

OpenSSH doesn't use gcrypt at all.  It does use libcrypto from openssl but that's something else entirely.

> checking for getentropy... yes
> Which questions me, as I have a 3.16 kernel, which doesn’t support
> it.

It would appear that your glibc has the function call, although it doesn't work (due to the lack of kernel support for it).

Anyway, I had 1 attempt at installing Gentoo to reproduce this but ran out of patience.  Since every Gentoo build is effectively globally unique, this is up to someone with an affected system to diagnose (or provide comprehensive repro steps).  I'd suggesting starting with the backtrace suggested in comment#8 and comment#10.
Comment 14 alarig 2023-01-14 21:55:33 AEDT
(In reply to Darren Tucker from comment #13)
> It would appear that your glibc has the function call, although it
> doesn't work (due to the lack of kernel support for it).

So I tested with musl, and now with the patch sshd works again on my setup.
Should I open a bug on the glibc bugtracker or do you want me to send stack-traces with the glibc setup?
Comment 15 Darren Tucker 2023-01-14 21:59:31 AEDT
(In reply to alarig from comment #14)
[...]
> Should I open a bug on the glibc bugtracker or do you want me to
> send stack-traces with the glibc setup?

Stack traces from sshd w/glibc here please.  If it's something in a release glibc we'd like to handle it, and if it turns out to be something we can't work with we can look at filing a glibc bug.
Comment 16 Darren Tucker 2023-01-14 22:05:25 AEDT
BTW I committed the writev thing since it now seems clear it's needed in at least some configurations.
Comment 17 Darren Tucker 2023-02-04 18:22:04 AEDT
This might have been https://github.com/openssh/openssh-portable/commit/12da7823336434a403f25c7cc0c2c6aed0737a35, which is also in the 9.2 release.  Could you please try either the patch or 9.2p1?
Comment 18 jussi 2023-02-04 18:43:08 AEDT
With 9.2p1:

kernel: type=1326 audit(1675496422.310:1613): auid=1000 uid=22 gid=22 ses=2 pid=13652 comm="sshd" sig=31 syscall=234 compat=0 ip=0x7f65789a328c code=0x0