Bug 400 - ssh-keygen hangs
Summary: ssh-keygen hangs
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: ssh-keygen (show other bugs)
Version: -current
Hardware: All AIX
: P2 normal
Assignee: OpenSSH Bugzilla mailing list
URL: http://www.mgi-networks.com/
Keywords:
Depends on:
Blocks:
 
Reported: 2002-09-23 21:26 AEST by Mike Grierson
Modified: 2004-04-14 12:24 AEST (History)
1 user (show)

See Also:


Attachments
Send SIGINT to ssh-rand-helper child in case of timeout. (442 bytes, patch)
2002-10-20 20:42 AEST, Darren Tucker
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Grierson 2002-09-23 21:26:12 AEST
ssh stops working because ssh-keygen cannot get entropy.  Apparently, even if
configured with OpenSSL latest and prngd latest, openssh still uses the commands
that are listed in /usr/local/etc/ssh_prgn_cmds.  If one of those commands
hangs, then the timeout used in the build does not work during operation, and
ssh-keygen hangs... so ssh hangs.  The 'df' command is the offending command here.

The timeout used during the build to test the commands also does not appear to
be working if a command successfully test at build time fails during operation.

We have over 100 disks and any one of those disks can stop ssh, which we use for
a production batch job.  This is an unacceptable series failure mode. 
Fortunately commenting out the lines in
/usr/local/etc/ssh_prgn_cmds that contain the offending command, provided a
quick solution to our problem.  We now leave df commented out as documented in
our install notes below.

sshd installation documented at http://www.mcg-ct.com/openssh_privsep/ 

Given my understanding, there may be two bugs.
1.)  If using prngd, openssh should not use the /usr/local/etc/ssh_prng_cmds
2.)  If using /usr/local/etc/ssh_prng_cmds, the 200 msec default timeout should
    work during operation.
Comment 1 Darren Tucker 2002-10-20 20:42:09 AEST
Created attachment 156 [details]
Send SIGINT to ssh-rand-helper child in case of timeout.

You can reproduce this easily on Linux and Solaris (an probably others too) by
adding this to the top of ssh_prng_cmds:

"sleep 1000" /bin/sleep 0.02

then running ssh-rand-helper -vvv.

It appears to happen because closing the descriptor either command doesn't
produce a SIGPIPE or the command ignores it.

The patch sends a SIGINT to the child if the command times out. This should be
safe even if the command has already exitted because we haven't yet wait()ed
for it.
Comment 2 Damien Miller 2002-10-21 10:13:34 AEST
Applied - thanks.
Comment 3 Damien Miller 2004-04-14 12:24:18 AEST
Mass change of RESOLVED bugs to CLOSED