Bug 953 - openssh session hanging - prngd[671]: write() in socket_write() failed: Broken pipe
Summary: openssh session hanging - prngd[671]: write() in socket_write() failed: Broke...
Status: CLOSED WORKSFORME
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: ssh (show other bugs)
Version: 3.7.1p2
Hardware: SPARC Solaris
: P2 normal
Assignee: OpenSSH Bugzilla mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-11-17 03:42 AEDT by Stan Walczak
Modified: 2006-10-07 11:37 AEST (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stan Walczak 2004-11-17 03:42:30 AEDT
it occurs once or twice per day , any suggestion if this is openssh 
random number incompatibility issue ? 
 


server:Sun Enterprise E4500/E5500System 
SunOS 5.7 Generic_106541-32 sun4u sparc SUNW,Ultra-Enterprise 

part of the /var/adm/messages 

Nov  9 09:51:19 ashipts prngd[671]: write() in socket_write() failed: 
Broken pipe 
Nov  9 09:51:19 ashiptslast message repeated 30 times 
Nov  9 09:51:19 ashipts prngd[671]: closing service fd 15 for error.
Comment 1 Damien Miller 2004-11-17 07:29:02 AEDT
This looks very much a like a prngd issue and not an openssh problem. Are there
any error messages from openssh?
Comment 2 Darren Tucker 2004-11-17 09:08:20 AEDT
Someone on comp.security.ssh suggested that there is a bug in prngd-0.9.26 that
might cause this problem.
Comment 3 Stan Walczak 2004-11-18 01:23:30 AEDT
The following freezes on ashipt after running for a random period of time. It 
usually happens within 15 minutes. I tracked it down to an error with the read
() from the remote server. 
#!/bin/ksh 
while (( 1 ))
do
ssh -vvv -l ipts ash mv /tmp/ssh_test /tmp/ssh_test.tmp
ssh -vvv -l ipts ashipts0 chmod 600 /tmp/ssh_test.tmp
ssh -vvv -l ipts ashipts0 mv /tmp/ssh_test.tmp /tmp/ssh_test
sleep 2
done



debug2: callback start
debug2: ssh_session2_setup: id 0
debug1: Sending command: mv /tmp/ssh_test /tmp/ssh_test.tmp
debug2: channel 0: request exec
debug2: callback done
debug2: channel 0: open confirm rwindow 0 rmax 16384
debug2: channel 0: rcvd adjust 32768
debug2: channel 0: rcvd eof
debug2: channel 0: output open -> drain
debug2: channel 0: obuf empty
debug2: channel 0: close_write
debug2: channel 0: output drain -> closed
<freezes here, the remote command never executes either, although the debug 
statements say it did.  at this point the ssh connection hangs.  the ssh 
program is stuck on the local machine and connected sshd process is hung on the 
remote machine.  netstat shows that the connection is ESTABLISHED>
Comment 4 Damien Miller 2004-12-06 16:48:42 AEDT
What version of prngd are you using? Is it the buggy one that Darren mentioned? 

Do you get hangs with multiple "openssl rand -base64 20480" runs?
Comment 5 Stan Walczak 2004-12-16 04:24:42 AEDT
prngd 0.9.25 (30 May 2002)
Comment 6 Darren Tucker 2004-12-16 11:41:56 AEDT
Does the problem occur if you upgrade prngd?

You also didn't answer Damien's question: if you stick "openssl rand -base64
20480" in a loop does it hang or abort?
Comment 7 Thomas Binder 2004-12-16 23:27:36 AEDT
Actually, the bug I was talking about in comp.security.ssh was introduced in
prngd 0.9.25, so upgrading to the current version is definitely worth a try.
Comment 8 Stan Walczak 2004-12-17 09:16:59 AEDT
I run it only for 10 minutes "openssl rand -base64
20480" - did not hang or abort.
Security group said no for prngd 0.9.25 upgrade.
Could you please tell me more about prngd bug ? what the symptoms are ?
Comment 9 Damien Miller 2004-12-17 11:04:43 AEDT
Why don't you ask the prngd developers?
Comment 10 Thomas Binder 2004-12-17 23:58:01 AEDT
From the ChangeLog:

-- snip --
  When lots of processes query entropy at the same time, the "fairness"
  change introduced in 0.9.25 could lead to clients being only served with
  a delay.
  Reason: in serverloop.c the next client to serv is "i1" as determined from
    i1 = (prev_location + i) % max_query_old;
  The client that actually was served however was "i" instead of "i1".
  If the connection of "i" was not yet ready for "write" state set after
  getting the entropy, it might block.
  This problem has not been reported by any other user, though it might also
  have occured at other sites.
  Depending on the internal sorting of sockets by fd/slot (number increasing
  in the sequence of accepted connections, closed connections are
  removed from the list), connections might appear locked.
  The entropy served was not provided in the sequence intended. The
  entropy bytes returned via internal buffer however were consistent
  with the connection served (buffer[i]) was filled correctly for
  connection[i]. The problem therefore has no impact on the quality
  of seeding.
-- snap --
Comment 11 Darren Tucker 2005-01-06 10:31:53 AEDT
Going back to comment #3, when the connection freezes, does netstat show
anything in the send or receive queues for the frozen connection?  (on either
client or server?)
Comment 12 Darren Tucker 2005-01-27 17:16:25 AEDT
I've tried to reproduce this with OpenSSH 3.7.1p2 on Solaris 2.5.1 (the only
system I have available that doesn't have /dev/random).  It ran a variant of
your script on my system for over an hour without a single hang.

Some more questions:
- can you reproduce it with the current version of OpenSSH (3.9p1)?
- are the client and server both running Solaris 7 and OpenSSH 3.7.1p2?
- if you attach a truss to the hung process (truss -p [pid]) what does it say? 
(client and/or server).
Comment 13 Stan Walczak 2005-01-28 09:21:04 AEDT
Will setup the test in the next couple of days.
Comment 14 Damien Miller 2005-05-24 16:02:00 AEST
4 months, no reply == closed bug
Comment 15 Darren Tucker 2006-10-07 11:37:55 AEST
Change all RESOLVED bug to CLOSED with the exception of the ones fixed post-4.4.