ssh stops working because ssh-keygen cannot get entropy. Apparently, even if configured with OpenSSL latest and prngd latest, openssh still uses the commands that are listed in /usr/local/etc/ssh_prgn_cmds. If one of those commands hangs, then the timeout used in the build does not work during operation, and ssh-keygen hangs... so ssh hangs. The 'df' command is the offending command here. The timeout used during the build to test the commands also does not appear to be working if a command successfully test at build time fails during operation. We have over 100 disks and any one of those disks can stop ssh, which we use for a production batch job. This is an unacceptable series failure mode. Fortunately commenting out the lines in /usr/local/etc/ssh_prgn_cmds that contain the offending command, provided a quick solution to our problem. We now leave df commented out as documented in our install notes below. sshd installation documented at http://www.mcg-ct.com/openssh_privsep/ Given my understanding, there may be two bugs. 1.) If using prngd, openssh should not use the /usr/local/etc/ssh_prng_cmds 2.) If using /usr/local/etc/ssh_prng_cmds, the 200 msec default timeout should work during operation.
Created attachment 156 [details] Send SIGINT to ssh-rand-helper child in case of timeout. You can reproduce this easily on Linux and Solaris (an probably others too) by adding this to the top of ssh_prng_cmds: "sleep 1000" /bin/sleep 0.02 then running ssh-rand-helper -vvv. It appears to happen because closing the descriptor either command doesn't produce a SIGPIPE or the command ignores it. The patch sends a SIGINT to the child if the command times out. This should be safe even if the command has already exitted because we haven't yet wait()ed for it.
Applied - thanks.
Mass change of RESOLVED bugs to CLOSED