Bug 3286 - sshd dumps core after authentication timeout
Summary: sshd dumps core after authentication timeout
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 8.5p1
Hardware: All Linux
: P5 normal
Assignee: Damien Miller
URL:
Keywords:
Depends on:
Blocks: V_8_7
  Show dependency treegraph
 
Reported: 2021-03-23 02:18 AEDT by Dan Harnett
Modified: 2022-02-25 13:59 AEDT (History)
3 users (show)

See Also:


Attachments
systemd log (1.93 KB, text/plain)
2021-03-23 02:23 AEDT, Dan Harnett
no flags Details
gdb backtrace of sshd compiled with debugging info (3.17 KB, text/plain)
2021-03-23 02:23 AEDT, Dan Harnett
no flags Details
strace when the child logs successfully (452 bytes, text/plain)
2021-03-23 02:24 AEDT, Dan Harnett
no flags Details
strace when the child fails to log (638 bytes, text/plain)
2021-03-23 02:24 AEDT, Dan Harnett
no flags Details
strace of the parent, includes closing the logging socket (1.00 KB, text/plain)
2021-03-23 02:25 AEDT, Dan Harnett
no flags Details
avoid sigdie() in privsep child (762 bytes, patch)
2021-04-30 15:25 AEST, Damien Miller
dtucker: ok+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dan Harnett 2021-03-23 02:18:09 AEDT
The sshd daemon in the latest OpenSSH release (8.5p1) running under Arch Linux occasionally dumps core when an authentication attempt times out.  This doesn't appear to be specific to Arch.  It seems pretty easy to reproduce by just leaving a ssh client waiting at the password prompt for the duration of LoginGraceTime.

Mar 17 16:26:38 localhost systemd-coredump[4024]: [🡕] Process 4019 (sshd) of user 65534 dumped core.

                                                  Stack trace of thread 4019:
                                                  #0  0x00007fe3cf8f68fb __socket (libc.so.6 + 0x1008fb)
                                                  #1  0x00007fe3cf8ef013 openlog_internal (libc.so.6 + 0xf9013)
                                                  #2  0x00007fe3cf8ef4f7 __vsyslog_internal (libc.so.6 + 0xf94f7)
                                                  #3  0x00007fe3cf8ef803 __syslog_chk (libc.so.6 + 0xf9803)
                                                  #4  0x00005626488bcd4c n/a (sshd + 0x65d4c)
                                                  #5  0x00005626488ba1ec n/a (sshd + 0x631ec)
                                                  #6  0x000056264888ada9 n/a (sshd + 0x33da9)
                                                  #7  0x00005626488bccc5 n/a (sshd + 0x65cc5)
                                                  #8  0x00005626488bd0b1 n/a (sshd + 0x660b1)
                                                  #9  0x0000562648866376 n/a (sshd + 0xf376)
                                                  #10 0x00007fe3cf832f80 __restore_rt (libc.so.6 + 0x3cf80)
                                                  #11 0x00007fe3cf8eca67 __select (libc.so.6 + 0xf6a67)
                                                  #12 0x00005626488c3745 n/a (sshd + 0x6c745)
                                                  #13 0x00005626488c96b3 n/a (sshd + 0x726b3)
                                                  #14 0x00005626488c97b9 n/a (sshd + 0x727b9)
                                                  #15 0x00005626488638ca n/a (sshd + 0xc8ca)
                                                  #16 0x00007fe3cf81db25 __libc_start_main (libc.so.6 + 0x27b25)
                                                  #17 0x000056264886609e n/a (sshd + 0xf09e)



Compiled with debugging info, gdb gives the following backtrace:

  Using host libthread_db library "/usr/lib/libthread_db.so.1".
  Core was generated by `sshd: unknown [net]'.
  Program terminated with signal SIGSYS, Bad system call.
  #0  0x00007fb7ee8e28fb in socket () from /usr/lib/libc.so.6
  (gdb) bt
  #0  0x00007fb7ee8e28fb in socket () from /usr/lib/libc.so.6
  #1  0x00007fb7ee8db013 in openlog_internal () from /usr/lib/libc.so.6
  #2  0x00007fb7ee8db4f7 in __vsyslog_internal () from /usr/lib/libc.so.6
  #3  0x00007fb7ee8db803 in __syslog_chk () from /usr/lib/libc.so.6
  #4  0x000055d42c649d4c in syslog (__fmt=0x55d42c6922f0 "%.500s", __pri=2) at /usr/include/bits/syslog.h:31
  #5  do_log (args=0x7ffe1d1895e0, fmt=0x7ffe1d189100 "mm_log_handler: write: %s", suffix=0x0, force=, level=SYSLOG_LEVEL_FATAL, line=111,
      func=0x55d42c68a0a0 <__func__.27> "mm_log_handler", file=0x55d42c689940 "monitor_wrap.c") at log.c:416
  #6  sshlogv (file=0x55d42c689940 "monitor_wrap.c", func=0x55d42c68a0a0 <__func__.27> "mm_log_handler", line=111, showfunc=,
      level=SYSLOG_LEVEL_FATAL, suffix=0x0, fmt=0x55d42c686cc2 "write: %s", args=0x7ffe1d1895e0) at log.c:485
  #7  0x000055d42c6471ec in sshfatal (file=file@entry=0x55d42c689940 "monitor_wrap.c", func=func@entry=0x55d42c68a0a0 <__func__.27> "mm_log_handler",
      line=line@entry=111, showfunc=showfunc@entry=1, level=level@entry=SYSLOG_LEVEL_FATAL, suffix=suffix@entry=0x0, fmt=0x55d42c686cc2 "write: %s")
      at fatal.c:43
  #8  0x000055d42c617da9 in mm_log_handler (file=0x55d42c680004 "sshd.c", func=, line=, level=SYSLOG_LEVEL_FATAL,
      msg=0x7ffe1d189bd0 "Timeout before authentication for 208.105.190.83 port 30569", ctx=0x55d42e235e40) at monitor_wrap.c:111     #9  0x000055d42c649cc5 in do_log (args=0x7ffe1d18a4b0, fmt=0x7ffe1d189fd0 "Timeout before authentication for %s port %d", suffix=0x0, force=,                                                                                                                               level=SYSLOG_LEVEL_FATAL, line=370, func=0x55d42c6818f0 <__func__.26> "grace_alarm_handler", file=0x55d42c680004 "sshd.c") at log.c:403                                                                                                                             #10 sshlogv (file=0x55d42c680004 "sshd.c", func=0x55d42c6818f0 <__func__.26> "grace_alarm_handler", line=370, showfunc=,
      level=level@entry=SYSLOG_LEVEL_FATAL, suffix=0x0, fmt=0x55d42c680700 "Timeout before authentication for %s port %d", args=0x7ffe1d18a4b0) at log.c:485                                                                                                              #11 0x000055d42c64a0b1 in sshsigdie (file=file@entry=0x55d42c680004 "sshd.c", func=func@entry=0x55d42c6818f0 <__func__.26> "grace_alarm_handler",                                                                                                                           line=line@entry=370, showfunc=showfunc@entry=0, level=level@entry=SYSLOG_LEVEL_ERROR, suffix=suffix@entry=0x0,
      fmt=0x55d42c680700 "Timeout before authentication for %s port %d") at log.c:454
  #12 0x000055d42c5f3376 in grace_alarm_handler (sig=) at sshd.c:370
  #13
  #14 0x00007fb7ee8d8a67 in select () from /usr/lib/libc.so.6
  #15 0x000055d42c650745 in ssh_packet_read_seqnr (ssh=ssh@entry=0x55d42e2442e0, typep=typep@entry=0x7ffe1d18cc73 "", seqnr_p=seqnr_p@entry=0x7ffe1d18cc74)
      at packet.c:1364
  #16 0x000055d42c6566b3 in ssh_dispatch_run (ssh=ssh@entry=0x55d42e2442e0, mode=mode@entry=0, done=done@entry=0x55d42e245fc0) at dispatch.c:96
  #17 0x000055d42c6567b9 in ssh_dispatch_run_fatal (ssh=ssh@entry=0x55d42e2442e0, mode=mode@entry=0, done=done@entry=0x55d42e245fc0) at dispatch.c:133
  #18 0x000055d42c603411 in do_authentication2 (ssh=ssh@entry=0x55d42e2442e0) at auth2.c:176
  #19 0x000055d42c5f0903 in main (ac=, av=) at sshd.c:2218


It looks like the parent process closes the logging socket before the child process has written to syslog, and glibc then tries to re-open the socket, which isn't allowed in the seccomp sandbox.

strace of a child process when everything works (no core dump; managed to write to the socket before the parent closed it):

  3774  pselect6(5, [4<socket:[83561]>], NULL, NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted if no handler)
  3774  --- SIGALRM {si_signo=SIGALRM, si_code=SI_USER, si_pid=3773, si_uid=0} ---
  3774  getpgid(0)                        = 3773
  3774  getpid()                          = 3774
  3774  write(8<pipe:[83581]>, "\0\0\0h\0\0\0\6sshd.c\0\0\0\23grace_alarm_ha"..., 108) = 108
  3774  exit_group(1)                     = ?
  3774  +++ exited with 1 +++

strace of a child process when this issue is triggered (core dump; the parent has already closed the socket):
 
  3799  pselect6(5, [4<socket:[83972]>], NULL, NULL, NULL, NULL) = ? ERESTARTNOHAND (To be restarted if no handler)
  3799  --- SIGALRM {si_signo=SIGALRM, si_code=SI_USER, si_pid=3798, si_uid=0} ---
  3799  getpgid(0)                        = 3798
  3799  getpid()                          = 3799
  3799  write(8<pipe:[83992]>, "\0\0\0h\0\0\0\6sshd.c\0\0\0\23grace_alarm_ha"..., 108) = -1 EPIPE (Broken pipe)
  3799  openat(AT_FDCWD, "/etc/localtime", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
  3799  getpid()                          = 3799
  3799  socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = ?
  3799  +++ killed by SIGSYS (core dumped) +++

strace of a parent process:

  3773  restart_syscall(<... resuming interrupted read ...>) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
  3773  --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
  3773  kill(3774, SIGALRM)               = 0
  3773  getpgid(0)                        = 3773
  3773  getpid()                          = 3773
  3773  rt_sigaction(SIGTERM, {sa_handler=SIG_IGN, sa_mask=~[RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f859affef80}, {sa_handler=SIG_DFL, sa_mask=~[KILL STOP RTMIN RT_1], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f859affef80}, 8) = 0
  3773  kill(0, SIGTERM)                  = 0
  3773  getpid()                          = 3773
  3773  socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 5<socket:[83963]>
  3773  connect(5<socket:[83963]>, {sa_family=AF_UNIX, sun_path="/dev/log"}, 110) = 0
  3773  sendto(5<socket:[83963]>, "<34>Mar 17 15:26:59 sshd[3773]: "..., 98, MSG_NOSIGNAL, NULL, 0) = 98
  3773  close(5<socket:[83963]>)          = 0
  3773  exit_group(1)                     = ?
  3773  +++ exited with 1 +++
Comment 1 Dan Harnett 2021-03-23 02:23:11 AEDT
Created attachment 3482 [details]
systemd log
Comment 2 Dan Harnett 2021-03-23 02:23:44 AEDT
Created attachment 3483 [details]
gdb backtrace of sshd compiled with debugging info
Comment 3 Dan Harnett 2021-03-23 02:24:19 AEDT
Created attachment 3484 [details]
strace when the child logs successfully
Comment 4 Dan Harnett 2021-03-23 02:24:48 AEDT
Created attachment 3485 [details]
strace when the child fails to log
Comment 5 Dan Harnett 2021-03-23 02:25:16 AEDT
Created attachment 3486 [details]
strace of the parent, includes closing the logging socket
Comment 6 Dan Harnett 2021-03-23 02:26:03 AEDT
My apologies for the inline traces.  I've added them as attachments.
Comment 7 Damien Miller 2021-03-23 14:39:12 AEDT
It looks like we don't reset the LoginGraceTime SIGALRM handler in the privsep child. Perhaps this handler is racing the monitor process to terminate the session on timer expiry, but running afoul of the seccomp-bpf sandbox when it tries to create a socket to talk to syslogd.

That doesn't explain why the logging isn't going via the monitor as it should be. A possibility here is the recursion protection in log.c:do_log() - if the signal arrived while a logging call was active, then the logging code falls back to attempting to syslog the message directly. AFAIK this hasn't changed recently though.

In any case, we should skip the sigdie() in the privsep child, but I'd still like to figure out why this is triggering when it wasn't before.
Comment 8 Damien Miller 2021-03-23 14:39:48 AEDT
oh, thanks for the detailed diagnostic information. It's a huge help when someone takes the time to get a backtrace with symbols.
Comment 9 Damien Miller 2021-04-30 15:25:50 AEST
Created attachment 3504 [details]
avoid sigdie() in privsep child

This patch should avoid the problem, but I still don't understand why this condition did not exist previously. Darren, is there any chance the ssh_signal change could have triggered this?
Comment 10 Darren Tucker 2021-04-30 16:08:19 AEST
Comment on attachment 3504 [details]
avoid sigdie() in privsep child

I could imagine it happening if signal(SIGALRM..) was reset during fork but not for sigaction but I don't known that this is the case.  The man page for alarm(1) only says:

       Alarms  created  by  alarm() are preserved across execve(2) and are not
       inherited by children created via fork(2).
Comment 11 Damien Miller 2021-05-07 13:14:41 AEST
committed as 8c396f42c19 and will be in the OpenSSH 8.7 release
Comment 12 Damien Miller 2022-02-25 13:59:10 AEDT
closing bugs resolved before openssh-8.9