it occurs once or twice per day , any suggestion if this is openssh random number incompatibility issue ? server:Sun Enterprise E4500/E5500System SunOS 5.7 Generic_106541-32 sun4u sparc SUNW,Ultra-Enterprise part of the /var/adm/messages Nov 9 09:51:19 ashipts prngd[671]: write() in socket_write() failed: Broken pipe Nov 9 09:51:19 ashiptslast message repeated 30 times Nov 9 09:51:19 ashipts prngd[671]: closing service fd 15 for error.
This looks very much a like a prngd issue and not an openssh problem. Are there any error messages from openssh?
Someone on comp.security.ssh suggested that there is a bug in prngd-0.9.26 that might cause this problem.
The following freezes on ashipt after running for a random period of time. It usually happens within 15 minutes. I tracked it down to an error with the read () from the remote server. #!/bin/ksh while (( 1 )) do ssh -vvv -l ipts ash mv /tmp/ssh_test /tmp/ssh_test.tmp ssh -vvv -l ipts ashipts0 chmod 600 /tmp/ssh_test.tmp ssh -vvv -l ipts ashipts0 mv /tmp/ssh_test.tmp /tmp/ssh_test sleep 2 done debug2: callback start debug2: ssh_session2_setup: id 0 debug1: Sending command: mv /tmp/ssh_test /tmp/ssh_test.tmp debug2: channel 0: request exec debug2: callback done debug2: channel 0: open confirm rwindow 0 rmax 16384 debug2: channel 0: rcvd adjust 32768 debug2: channel 0: rcvd eof debug2: channel 0: output open -> drain debug2: channel 0: obuf empty debug2: channel 0: close_write debug2: channel 0: output drain -> closed <freezes here, the remote command never executes either, although the debug statements say it did. at this point the ssh connection hangs. the ssh program is stuck on the local machine and connected sshd process is hung on the remote machine. netstat shows that the connection is ESTABLISHED>
What version of prngd are you using? Is it the buggy one that Darren mentioned? Do you get hangs with multiple "openssl rand -base64 20480" runs?
prngd 0.9.25 (30 May 2002)
Does the problem occur if you upgrade prngd? You also didn't answer Damien's question: if you stick "openssl rand -base64 20480" in a loop does it hang or abort?
Actually, the bug I was talking about in comp.security.ssh was introduced in prngd 0.9.25, so upgrading to the current version is definitely worth a try.
I run it only for 10 minutes "openssl rand -base64 20480" - did not hang or abort. Security group said no for prngd 0.9.25 upgrade. Could you please tell me more about prngd bug ? what the symptoms are ?
Why don't you ask the prngd developers?
From the ChangeLog: -- snip -- When lots of processes query entropy at the same time, the "fairness" change introduced in 0.9.25 could lead to clients being only served with a delay. Reason: in serverloop.c the next client to serv is "i1" as determined from i1 = (prev_location + i) % max_query_old; The client that actually was served however was "i" instead of "i1". If the connection of "i" was not yet ready for "write" state set after getting the entropy, it might block. This problem has not been reported by any other user, though it might also have occured at other sites. Depending on the internal sorting of sockets by fd/slot (number increasing in the sequence of accepted connections, closed connections are removed from the list), connections might appear locked. The entropy served was not provided in the sequence intended. The entropy bytes returned via internal buffer however were consistent with the connection served (buffer[i]) was filled correctly for connection[i]. The problem therefore has no impact on the quality of seeding. -- snap --
Going back to comment #3, when the connection freezes, does netstat show anything in the send or receive queues for the frozen connection? (on either client or server?)
I've tried to reproduce this with OpenSSH 3.7.1p2 on Solaris 2.5.1 (the only system I have available that doesn't have /dev/random). It ran a variant of your script on my system for over an hour without a single hang. Some more questions: - can you reproduce it with the current version of OpenSSH (3.9p1)? - are the client and server both running Solaris 7 and OpenSSH 3.7.1p2? - if you attach a truss to the hung process (truss -p [pid]) what does it say? (client and/or server).
Will setup the test in the next couple of days.
4 months, no reply == closed bug
Change all RESOLVED bug to CLOSED with the exception of the ones fixed post-4.4.