Created attachment 2368 [details] A patch which seems to solve this problem. I got "sshd[$pid]: fatal: fork of unprivileged child failed" in /var/log/secure but the connection with ssh client remained. I examined the cause and found that this problem happens when fork() in privsep_preauth()/privsep_postauth() fails. You can easily reproduce this problem by replacing fork() in privsep_preauth()/privsep_postauth() with -1. I don't know what is the right fix, but at least forcibly closing all sockets before exit() seems to solve this problem. I'm using RHEL 6.4's openssh-5.3p1-84.1.el6.src.rpm , but I think this problem exists in any versions which have privsep_preauth()/privsep_postauth() . Regards.
I can't see how this patch is needed. fatal() calls exit(), which closes all process file descriptors. If exit() isn't closing file descriptors then that is a kernel bug AFAIK.
This is not a kernel bug. fatal() is designed to eventually call exit(), but this bug is preventing sshd process from calling exit(). The child process calls fatal() when fork() failed at privsep_postauth(). (Please note that fork() is replaced with -1 for explanation.) ---------- privsep_postauth() in sshd.c ---------- static void privsep_postauth(Authctxt *authctxt) { u_int32_t rnd[256]; #ifdef DISABLE_FD_PASSING if (1) { #else if (authctxt->pw->pw_uid == 0 || options.use_login) { #endif /* File descriptor passing is broken or root login */ use_privsep = 0; goto skip; } /* New socket pair */ monitor_reinit(pmonitor); pmonitor->m_pid = -1; // Emulate the fork() failure. if (pmonitor->m_pid == -1) fatal("fork of unprivileged child failed"); (...snipped...) ---------- privsep_postauth() in sshd.c ---------- fatal() calls cleanup_exit(255). (Please note that dummy write(-1, "", $step) lines are inserted for comparing with strace command.) ---------- fatal() in fatal.c ---------- void fatal(const char *fmt,...) { va_list args; va_start(args, fmt); do_log(SYSLOG_LEVEL_FATAL, fmt, args); va_end(args); write(-1, "", 0); cleanup_exit(255); } ---------- fatal() in fatal.c ---------- cleanup_exit(255) will eventually call _exit(255). ---------- cleanup_exit() in sshd.c ---------- /* server specific fatal cleanup */ void cleanup_exit(int i) { static int in_cleanup; int is_privsep_child; write(-1, "", 1); /* cleanup_exit can be called at the very least from the privsep wrappers used for auditing. Make sure we don't recurse indefinitely. */ if (in_cleanup) _exit(i); write(-1, "", 2); in_cleanup = 1; if (the_authctxt) do_cleanup(the_authctxt); write(-1, "", 3); is_privsep_child = use_privsep && pmonitor != NULL && !mm_is_monitor(); write(-1, "", 4); if (sensitive_data.host_keys != NULL) destroy_sensitive_data(is_privsep_child); write(-1, "", 5); packet_destroy_all(1, is_privsep_child); #ifdef SSH_AUDIT_EVENTS /* done after do_cleanup so it can cancel the PAM auth 'thread' */ write(-1, "", 6); if ((the_authctxt == NULL || !the_authctxt->authenticated) && (!use_privsep || mm_is_monitor())) audit_event(SSH_CONNECTION_ABANDON); #endif write(-1, "", 7); _exit(i); } ---------- cleanup_exit() in sshd.c ---------- Did we reach the _exit(i) line? Let's check the strace command. ---------- strace log start ---------- [pid 17153] socketpair(PF_FILE, SOCK_STREAM, 0, [5, 6]) = 0 [pid 17153] fcntl(5, F_SETFD, FD_CLOEXEC) = 0 [pid 17153] fcntl(6, F_SETFD, FD_CLOEXEC) = 0 [pid 17153] sendto(4, "<82>Nov 1 11:11:08 sshd[17153]:"..., 73, MSG_NOSIGNAL, NULL, 0) = 73 [pid 17153] close(4) = 0 [pid 17153] write(4294967295, "", 0) = -1 EBADF (Bad file descriptor) [pid 17153] write(4294967295, "\0", 1) = -1 EBADF (Bad file descriptor) [pid 17153] write(4294967295, "\0G", 2) = -1 EBADF (Bad file descriptor) [pid 17153] write(4294967295, "\0Go", 3) = -1 EBADF (Bad file descriptor) [pid 17153] write(4294967295, "\0Got", 4) = -1 EBADF (Bad file descriptor) [pid 17153] getuid() = 0 [pid 17153] write(5, "\0\0\0DR", 5) = 5 [pid 17153] write(5, "\0\0\0/c1:bd:33:a5:66:d5:83:2d:0c:9"..., 67) = 67 [pid 17153] read(5, ^C <unfinished ...> Process 17147 detached Process 17153 detached ---------- strace log end ---------- We can see that the child process reached the write(-1, "", 4) line but did not reach the write(-1, "", 5) line. This means that the child process is sleeping at if (sensitive_data.host_keys != NULL) destroy_sensitive_data(is_privsep_child); trying to read data from fd == 5. What is fd == 5 connected with? According to strace command, fd == 5 and fd == 6 are a socket pair created by monitor_reinit() call in privsep_postauth(). ---------- monitor_reinit() in monitor.c ---------- void monitor_reinit(struct monitor *mon) { int pair[2]; monitor_socketpair(pair); mon->m_recvfd = pair[0]; mon->m_sendfd = pair[1]; } ---------- monitor_reinit() in monitor.c ---------- destroy_sensitive_data() tried to write to fd == 5 and trying to read from fd == 5, but there is no writers writing to fd == 6. Dead lock caused by trying to I/O against wrong file descriptor. This was why calling shutdown() seems to solve the problem.
Created attachment 2369 [details] A patch which fixes this problem.
OK. I found the exact location. fatal("fork of unprivileged child failed") calls cleanup_exit(255). cleanup_exit(255) calls destroy_sensitive_data(1). destroy_sensitive_data(1) calls mm_audit_destroy_sensitive_data(fp, pid, uid). mm_audit_destroy_sensitive_data() uses global pmonitor->m_recvfd which now references a socket pair. ---------- mm_audit_destroy_sensitive_data() in monitor_wrap.c ---------- void mm_audit_destroy_sensitive_data(const char *fp, pid_t pid, uid_t uid) { Buffer m; buffer_init(&m); buffer_put_cstring(&m, fp); buffer_put_int64(&m, pid); buffer_put_int64(&m, uid); mm_request_send(pmonitor->m_recvfd, MONITOR_REQ_AUDIT_SERVER_KEY_FREE, &m); mm_request_receive_expect(pmonitor->m_recvfd, MONITOR_ANS_AUDIT_SERVER_KEY_FREE, &m); buffer_free(&m); } ---------- mm_audit_destroy_sensitive_data() in monitor_wrap.c ---------- Regarding privsep_preauth(), pmonitor = monitor_init() causes this problem when fork() fails. Fortunately, timeout was previously configured via alarm() which will eventually terminate the process even if fork() failed. ---------- privsep_preauth() in sshd.c ---------- static int privsep_preauth(Authctxt *authctxt) { int status; pid_t pid; /* Set up unprivileged child process to deal with network data */ pmonitor = monitor_init(); /* Store a pointer to the kex for later rekeying */ pmonitor->m_pkex = &xxx_kex; pid = fork(); if (pid == -1) { fatal("fork of unprivileged child failed"); (...snipped...) ---------- privsep_preauth() in sshd.c ---------- Regarding privsep_postauth(), monitor_reinit(pmonitor) causes this problem when fork() fails. Unfortunately, timeout previously configured via alarm() was disabled, which results in forever wait when fork() failed. ---------- privsep_postauth() in sshd.c ---------- static void privsep_postauth(Authctxt *authctxt) { u_int32_t rnd[256]; #ifdef DISABLE_FD_PASSING if (1) { #else if (authctxt->pw->pw_uid == 0 || options.use_login) { #endif /* File descriptor passing is broken or root login */ use_privsep = 0; goto skip; } /* New socket pair */ monitor_reinit(pmonitor); pmonitor->m_pid = fork(); if (pmonitor->m_pid == -1) fatal("fork of unprivileged child failed"); (...snipped...) ---------- privsep_postauth() in sshd.c ---------- I confirmed that the patch in Comment 3 can fix this problem.
As I get no response, I set importance to "major" due to following impact. This bug actually blocked an unattended ssh session (execution of batched job) of an enterprise server. The worst case might be "sleeping forever while trying to kdump over ssh" which means that "the kdump procedure is unable to reboot after the kernel panic", severely affecting availability (down time) of the enterprise servers.
look at this for next release
It's an issue of RHEL's downstream audit patches - https://bugzilla.redhat.com/show_bug.cgi?id=1028643 : openssh-5.3p1-audit.patch: + is_privsep_child = use_privsep && pmonitor != NULL && !mm_is_monitor(); This hunk needs a change.
Set all RESOLVED bugs to CLOSED with release of OpenSSH 7.1