comp.security.ssh posting and reply from Richard Silverman below: While trying to troubleshoot a seperate problem, I came across a strange, repeatable behavior that I haven't been able to find any further information on. In short, I'm establishing a simple port forward from 'terrapin' to 192.168.1.120 via 'osgiliath,' another host on 192.168.1.0/24. An Nmap of the port I'm forwarding on the local machine causes the SSH session to end. To establish the connection, I issue the following command: -- terrapin:~ irish$ ssh -vvv -L 3389:192.168.1.120:3389 irish@osgiliath.foo.com -- Since I maxed verbosity, there's quite a lot of logging but at some point it logs in. The weirdness comes in when I then nmap localhost (originally a full sweep but just the single port is shown below). -- terrapin:~ irish$ nmap -sT -p 3389 localhost Starting nmap V. 3.00 ( www.insecure.org/nmap/ ) Interesting ports on localhost (127.0.0.1): Port State Service 3389/tcp open ms-term-serv Nmap run completed -- 1 IP address (1 host up) scanned in 0 seconds -- In the process of that, the following happens in the original SSH session: -- irish@osgiliath:~$ exitdebug1: Connection to port 3389 forwarding to 192.168.1.120 port 3389 requested. debug2: fd 9 setting TCP_NODELAY debug2: fd 9 is O_NONBLOCK debug2: fd 9 is O_NONBLOCK debug1: channel 3: new [direct-tcpip] debug2: channel 3: open confirm rwindow 131072 rmax 32768 debug2: channel 3: read<=0 rfd 9 len -1 debug2: channel 3: read failed debug2: channel 3: close_read debug2: channel 3: input open -> drain debug2: channel 3: ibuf empty debug2: channel 3: send eof debug2: channel 3: input drain -> closed debug2: channel 3: rcvd eof debug2: channel 3: output open -> drain debug2: channel 3: obuf empty debug2: channel 3: close_write debug2: channel 3: chan_shutdown_write: shutdown() failed for fd9: Invalid argument debug2: channel 3: output drain -> closed debug2: channel 3: rcvd close debug3: channel 3: will not send data after close debug2: channel 3: send close debug2: channel 3: is dead debug2: channel 3: garbage collecting debug1: channel 3: free: direct-tcpip: listening port 3389 for 192.168.1.120 port 3389, connect from 127.0.0.1 port 65397, nchannels 4 debug3: channel 3: status: The following connections are open: #2 client-session (t4 r0 i0/0 o0/0 fd 6/7) #3 direct-tcpip: listening port 3389 for 192.168.1.120 port 3389, connect from 127.0.0.1 port 65397 (t4 r1 i3/0 o3/0 fd 9/9) debug3: channel 3: close_fds r 9 w 9 e -1 debug1: client_input_channel_req: channel 2 rtype exit-status reply 0 debug2: channel 2: rcvd eof debug2: channel 2: output open -> drain debug2: channel 2: rcvd close debug2: channel 2: close_read debug2: channel 2: input open -> closed debug3: channel 2: will not send data after close logout debug3: channel 2: will not send data after close debug2: channel 2: obuf empty debug2: channel 2: close_write debug2: channel 2: output drain -> closed debug2: channel 2: almost dead debug2: channel 2: gc: notify user debug2: channel 2: gc: user detached debug2: channel 2: send close debug2: channel 2: is dead debug2: channel 2: garbage collecting debug1: channel 2: free: client-session, nchannels 3 debug3: channel 2: status: The following connections are open: #2 client-session (t4 r0 i3/0 o3/0 fd -1/-1) debug3: channel 2: close_fds r -1 w -1 e 8 debug1: channel 0: free: port listener, nchannels 2 debug3: channel 0: status: The following connections are open: debug3: channel 0: close_fds r 4 w 4 e -1 debug1: channel 1: free: port listener, nchannels 1 debug3: channel 1: status: The following connections are open: debug3: channel 1: close_fds r 5 w 5 e -1 Connection to osgiliath.foo.com closed. debug1: Transferred: stdin 0, stdout 0, stderr 54 bytes in 16.6 seconds debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 3.3 debug1: Exit status 0 terrapin:~ irish$ -- Is this expected behavior? Is SSH supposed to exit as it did? Relevant version information is as follows: terrapin:~ irish$ uname -a Darwin terrapin 8.0.0 Darwin Kernel Version 8.0.0: Sat Mar 26 14:15:22 PST 2005; root:xnu-792.obj~1/RELEASE_PPC Power Macintosh powerpc terrapin:~ irish$ ssh -V OpenSSH_3.8.1p1, OpenSSL 0.9.7b 10 Apr 2003 irish@osgiliath:~$ uname -a Linux osgiliath 2.4.23 #2 SMP Sat Jan 3 13:09:12 EST 2004 i686 GNU/Linux irish@osgiliath:~$ dpkg -s ssh | grep Version Version: 1:3.8.1p1-8.sarge.4 irish@osgiliath:~$ dpkg -s openssl | grep Version Version: 0.9.7e-3 -- Richard's reply: This appears to be a bug in OpenSSH, which only shows up when a TCP connection to a forwarded port is closed extremely quickly after being opened. The problem is here: [channels.c] static void port_open_helper(Channel *c, char *rtype) { int direct; char buf[1024]; char *remote_ipaddr = get_peer_ipaddr(c->sock); >>>>>> u_short remote_port = get_peer_port(c->sock); This is called very shortly after processing a connection opened on a forwarded port, channel_post_port_listener(). I couldn't replicate this by telnetting to the port, or even with a simple Perl program to open and immediately close a connection: ---------------------------------------------------------------------- #!/usr/bin/perl use IO::Socket; use Carp; ($server,$port) = @ARGV; $socket = IO::Socket::INET->new(PeerAddr => $server, PeerPort => $port) || croak(qq*cannot connect to "$server"*); $socket->close(); ---------------------------------------------------------------------- However, with nmap -sT, I get this: debug1: Connection to port 2001 forwarding to localhost port 22 requested. debug1: channel 2: new [direct-tcpip] debug1: getpeername failed: Transport endpoint is not connected Nmap is written in C so is faster, and also if you look at the network traffic, it simply sends a RST after the TCP handshake, whereas these other test do the more graceful FIN/ACK/FIN/ACK sequence. The upshot is that the close happens extremely quickly. Now, ssh exits at this point because get_peer_port() does this: [canohost.c] if (getpeername(sock, (struct sockaddr *)&from, &fromlen) < 0) { debug("getpeername failed: %.100s", strerror(errno)); cleanup_exit(255); } So ssh immediately exits if getpeername() fails. This is a bad choice, since here is a non-catastrophic (if uncommon) failure mode: the connection may already closed by the time execution reaches this point. The code should be changed so that OpenSSH handles this case and continues. -- Richard Silverman res@qoxp.net
Changed the OS to "All" and the version to "current" as per Richard Silverman's comment below. -- You might want to change the OS to "all" and the version to "current" -- it's clearly OS-independent and I did make it happen with both 3.8 and 4.1 on Linux, for example. - Richard
Created attachment 930 [details] Close accept/getpeername race Don't terminate connection on getpeername() failure. (Fix by markus@)
Markus has committed this fix to CVS -current. It will be included the 4.2 release, until then you can use the patch. Thanks for the detailed report.
Change all RESOLVED bug to CLOSED with the exception of the ones fixed post-4.4.