This issue was first noted on SFTP, but the cause was down in the spawned ssh process. On some systems (e.g. HP NonStop), read or write to a nonblocking socket will fail with EWOULDBLOCK instead of EAGAIN. The code in channels.c does not handle EAGAIN, causing the socket to be closed, but the parent process does not recieve notification, leading to a stall. in channel_handle_[erw]fd, the calls to read/write should check for EWOULDBLOCK as well as EAGAIN: e.g, in channel_handle_wfd: len = write(c->wfd, buf, dlen); if (len < 0 && (errno == EINTR || #ifdef EWOULDBLOCK errno == EWOULDBLOCK || #endif errno == EAGAIN)) This appears to be pervasive throughout the code, not just in channels.c.
Created attachment 1506 [details] test for EWOULDBLOCK everywhere we currently check for EAGAIN I guess we could do something like this patch. Does Nonstop define EAGAIN? If so, what is the difference between EAGAIN and EWOULDBLOCK?
Yes, NonStop defines EAGAIN, but doesn't use it in the case of a nonblocking socket. The proposed fix is essentially what i've done in our internal port. From the NonStop man page for write(2) (read is similar): [EAGAIN] One of these conditions exists: o An attempt was made to write to a file descriptor that cannot accept data, and the O_NONBLOCK flag is set. o A write to a pipe (FIFO file) of PIPE_BUF bytes or less is requested, O_NONBLOCK is set, and fewer than nbytes of free space are available. o The O_NONBLOCK flag is set on this file, and the process would be delayed in the write operation. [EWOULDBLOCK] The process attempted an operation on a socket for which O_NONBLOCK is set, there is no space available, and no error has occurred. My understanding is that EWOULDBLOCK is an older errno condition and newer systems don't use it. The NonStop sockets implementation is apparently based on the older code.
I'm relatively new here, and am not sure of the proper etiquette. I realize I gave a bad summary line (the effects rather than the cause). The summary line to this bug should probably be changed to something more descriptive (e.g. "No check for EWOULDBLOCK").
Thanks, djm. As an additional comment on this bug, the code in atomicio.c handled EWOULDBLOCK properly.
Created attachment 1541 [details] revised patch updated to -current
patch applied - this will be in openssh-5.1. Thanks!
Mass update RESOLVED->CLOSED after release of openssh-5.1