| Summary: | openssh session hanging - prngd[671]: write() in socket_write() failed: Broken pipe | ||
|---|---|---|---|
| Product: | Portable OpenSSH | Reporter: | Stan Walczak <stanislaw.walczak> |
| Component: | ssh | Assignee: | OpenSSH Bugzilla mailing list <openssh-bugs> |
| Status: | CLOSED WORKSFORME | ||
| Severity: | normal | ||
| Priority: | P2 | ||
| Version: | 3.7.1p2 | ||
| Hardware: | SPARC | ||
| OS: | Solaris | ||
|
Description
Stan Walczak
2004-11-17 03:42:30 AEDT
This looks very much a like a prngd issue and not an openssh problem. Are there any error messages from openssh? Someone on comp.security.ssh suggested that there is a bug in prngd-0.9.26 that might cause this problem. The following freezes on ashipt after running for a random period of time. It usually happens within 15 minutes. I tracked it down to an error with the read () from the remote server. #!/bin/ksh while (( 1 )) do ssh -vvv -l ipts ash mv /tmp/ssh_test /tmp/ssh_test.tmp ssh -vvv -l ipts ashipts0 chmod 600 /tmp/ssh_test.tmp ssh -vvv -l ipts ashipts0 mv /tmp/ssh_test.tmp /tmp/ssh_test sleep 2 done debug2: callback start debug2: ssh_session2_setup: id 0 debug1: Sending command: mv /tmp/ssh_test /tmp/ssh_test.tmp debug2: channel 0: request exec debug2: callback done debug2: channel 0: open confirm rwindow 0 rmax 16384 debug2: channel 0: rcvd adjust 32768 debug2: channel 0: rcvd eof debug2: channel 0: output open -> drain debug2: channel 0: obuf empty debug2: channel 0: close_write debug2: channel 0: output drain -> closed <freezes here, the remote command never executes either, although the debug statements say it did. at this point the ssh connection hangs. the ssh program is stuck on the local machine and connected sshd process is hung on the remote machine. netstat shows that the connection is ESTABLISHED> What version of prngd are you using? Is it the buggy one that Darren mentioned? Do you get hangs with multiple "openssl rand -base64 20480" runs? prngd 0.9.25 (30 May 2002) Does the problem occur if you upgrade prngd? You also didn't answer Damien's question: if you stick "openssl rand -base64 20480" in a loop does it hang or abort? Actually, the bug I was talking about in comp.security.ssh was introduced in prngd 0.9.25, so upgrading to the current version is definitely worth a try. I run it only for 10 minutes "openssl rand -base64 20480" - did not hang or abort. Security group said no for prngd 0.9.25 upgrade. Could you please tell me more about prngd bug ? what the symptoms are ? Why don't you ask the prngd developers? From the ChangeLog:
-- snip --
When lots of processes query entropy at the same time, the "fairness"
change introduced in 0.9.25 could lead to clients being only served with
a delay.
Reason: in serverloop.c the next client to serv is "i1" as determined from
i1 = (prev_location + i) % max_query_old;
The client that actually was served however was "i" instead of "i1".
If the connection of "i" was not yet ready for "write" state set after
getting the entropy, it might block.
This problem has not been reported by any other user, though it might also
have occured at other sites.
Depending on the internal sorting of sockets by fd/slot (number increasing
in the sequence of accepted connections, closed connections are
removed from the list), connections might appear locked.
The entropy served was not provided in the sequence intended. The
entropy bytes returned via internal buffer however were consistent
with the connection served (buffer[i]) was filled correctly for
connection[i]. The problem therefore has no impact on the quality
of seeding.
-- snap --
Going back to comment #3, when the connection freezes, does netstat show anything in the send or receive queues for the frozen connection? (on either client or server?) I've tried to reproduce this with OpenSSH 3.7.1p2 on Solaris 2.5.1 (the only system I have available that doesn't have /dev/random). It ran a variant of your script on my system for over an hour without a single hang. Some more questions: - can you reproduce it with the current version of OpenSSH (3.9p1)? - are the client and server both running Solaris 7 and OpenSSH 3.7.1p2? - if you attach a truss to the hung process (truss -p [pid]) what does it say? (client and/or server). Will setup the test in the next couple of days. 4 months, no reply == closed bug Change all RESOLVED bug to CLOSED with the exception of the ones fixed post-4.4. |