A simple 'write( )' system call from a 'C' program with zero as the value of the data length hangs forever. Easily reproducable via a "one-line" program with the statement write(1,"anything",0); It only appears to be an AIX problem with both 5.2 and 5.3 failing but works fine with SCO-Unix. Attached nullout.c will save a little bit of typing. Our client has since tried it with "OpenSSH_4.2p1 and OpenSSL 0.9.8 5-Jul-05" build and reports it still fails.
Created attachment 987 [details] sample program which fails
Created attachment 988 [details] sample program which fails
Created attachment 990 [details] text version of previous attachment this looks like a kernel bug on your OS - I can't see what it has to do with OpenSSH.
Created attachment 991 [details] Additional comments That was my initial thought. However the sample program works (without recompilation/relink) when logged in locally and via 'rlogin' or 'telnet' -- to me this would point to the ssh i/o interface and/or driver on the host system. Additionally we have tried both the ETerm and PuTTY clients and they fail identically on AIX only.
Which AIX Maintenance Levels do your systems have? Does the problem occur with other pty-using programs such as telnetd? (In reply to comment #3) > this looks like a kernel bug on your OS - I can't see what it has to do with > OpenSSH. I agree. Now some history: way back when dinosaurs roamed the earth (around AIX 4.3.3 ML 3 or so) the pty layer on AIX started returning zero for read() syscalls after zero-length writes to the pty. This was a problem for sshd, since POSIX says that a return code of zero from read() means EOF; this effectively meant that a program performing zero-length writes such as yours would result in sshd closing the session. Since this remained busted for quite a while, sshd was changed to ignore such zero-length reads to work around it (see bug #124 for the gory details). I'm wondering if maybe IBM has attempted to fix this and gone to the other extreme? AFAICT the zero-length write should be a no-op... It's also possible that the the work-around now has a side-effect.
Could you please attach (as an attachment not in the comment field) the output from the server debugging when you run your program? (ie "/path/to/sshd -ddde -p 2022" then connect to the server on port 2022 and run your program)? BTW, I had a look for the changes mention in bug #124 but didn't find the zero-length fix where I expected. I'll need to look closer at that when I get a chance.
Created attachment 992 [details] Debug log files ZIPped up These are the debug files you asked for. I did it with both E-Term32 and PuTTY. Ihave included the debug output from the AIX server (aix-*.log files) and from the PC client (pc-*.log files)
Did this line appear in the sshd debug output immediately after you ran your program? debug2: channel 0: rcvd adjust 2 debug2: channel 0: read<=0 rfd 10 len 0 debug2: channel 0: read failed BTW, you didn't mention which AIX Maintenance Level and/or PTF you have on your systems.
Created attachment 993 [details] Log file extract Simple answer to your questin is YES but for completeness I have extracted the part of the logfile that occurs for the duration of the test program. I'm still waiting for the maint/patch level info from our client.
Created attachment 994 [details] only close connection for zero-length stdin reads when errno set I don't think that your program isn't really hanging, although it looks that way. What I think is happening is that your zero-length write results in a zero length read in sshd, which results in the channel being shut down. sshd is waiting for all of the file descriptors to close, while your program (or the shell) is waiting for its stdin to be read. With them deadlocked, it would appear that the sshd session hung. I just read the SuSv3 specs for read(2) (http://www.opengroup.org/onlinepubs/000095399/functions/read.html). It's not clear but it appears that returning a zero-length read is permitted for STREAMS sockets (although I didn't think AIX's pty layer was STREAMS based). So, AIX's behaviour might be compliant, although quite unusual. Anyway, please try the attached patch (against -current but should apply to 4.1p1 or 4.2p1). It's a bit ugly but it seems to be the only way to handle the zero-length case, assuming the above is correct.
Created attachment 1002 [details] Handle zero-length reads on AIX ptys I don't think the change to the control socket code is not necessary so I've removed it. Hopefully this will still resolve the problem.
Created attachment 1147 [details] Handle zero-length reads on AIX only I was wondering if there's any platforms out there that don't set errno... so this ought to be safer (although admittedly uglier). Unless there are objections I'd like to commit this one.
Comment on attachment 1147 [details] Handle zero-length reads on AIX only looks ok to me
Thanks all, this patch has been applied and will be in v4.4.
With the release of 4.4, we believe that this bug is now closed. For information about the release please see http://www.openssh.com/txt/release-4.4 .
(In reply to comment #12) > Created an attachment (id=1147) [details] > Handle zero-length reads on AIX only > I was wondering if there's any platforms out there that don't set > errno... so this ought to be safer (although admittedly uglier). > Unless there are objections I'd like to commit this one Shouldn't the fix in channels.c +#ifndef PTY_ZEROREAD if (len <= 0) { +#else + if (len < 0 || (len == 0 && errno != 0)) { +#endif Actuall be +#ifdef PTY_ZEROREAD if (len <= 0) { +#else + if (len < 0 || (len == 0 && errno != 0)) { +#endif After applying the modified (changing ifndef to ifdef in channels.c) fix on AIX. The problem of ssh session hang is resolved. But now I face another problem on AIX. Task: Login with ssh WITH the -X or -Y option and start and ending wish and trying to logout Result: DISPLAY variable correctly set. After ending wish and trying to logout from the ssh shell the shell displayed: logout and then hangs there. The hanging ssh shell must be ended with CRTL-C Steps to reproduce: client prompt$ ssh -X server server prompt$ wish wish prompt: exit server prompt $ exit logout
(In reply to comment #16) > (In reply to comment #12) > > Created an attachment (id=1147) [details] [details] > > Handle zero-length reads on AIX only > > I was wondering if there's any platforms out there that don't set > > errno... so this ought to be safer (although admittedly uglier). > > Unless there are objections I'd like to commit this one > Shouldn't the fix in channels.c > +#ifndef PTY_ZEROREAD > if (len <= 0) { > +#else > + if (len < 0 || (len == 0 && errno != 0)) { > +#endif > Actuall be > +#ifdef PTY_ZEROREAD > if (len <= 0) { > +#else > + if (len < 0 || (len == 0 && errno != 0)) { > +#endif > After applying the modified (changing ifndef to ifdef in channels.c) > fix on AIX. The problem of ssh session hang is resolved. But now I face > another problem on AIX. > Task: Login with ssh WITH the -X or -Y option and start and ending wish > and trying to logout > Result: DISPLAY variable correctly set. After ending wish and trying to > logout from the ssh shell the shell displayed: logout and then hangs > there. The hanging ssh shell must be ended with CRTL-C > Steps to reproduce: > client prompt$ ssh -X server > server prompt$ wish > wish prompt: exit > server prompt $ exit > logout Forgot to mention that I applied tha patch on openssh-4.1
Created attachment 1208 [details] Fixes for bug #1102 as applied to the tree. > Shouldn't the fix in channels.c > +#ifndef PTY_ZEROREAD It's the other way around, but yes the patch here is wrong. It got fixed in the tree, and the fixes are in the 4.4p1 and 4.5p1 releases. The changes were: - (dtucker) [channels.c configure.ac serverloop.c] Bug #1102: Around AIX 4.3.3 ML3 or so, the AIX pty layer starting passing zero-length writes on the pty slave as zero-length reads on the pty master, which sshd interprets as the descriptor closing. Since most things don't do zero length writes this rarely matters, but occasionally it happens, and when it does the SSH pty session appears to hang, so we add a special case for this condition. ok djm@ - (dtucker) [serverloop.c] Get ifdef/ifndef the right way around for the bug #1102 workaround. - (dtucker) [channels.c serverloop.c] Apply the bug #1102 workaround to ptys only, otherwise sshd can hang exiting non-interactive sessions. These are included in the attached patch. As for the thing with wish, if you are having a different problem with a current release then please open a new bug for it (mention this bug# if you want). I am going to re-close this bug.