I have a DSL line at home, and want to use X11 forwarding to run X clients on a machine at work. The X11 forwarding works fine when the home laptop is connected directly to the DSL modem. However I use a router at home so that I can connect several machines to the net via the same DSL line. The X11 forwarding does NOT work when I try to connect to a solaris host from behind the router. The strange thing is that if I log into a different host (same version of sshd, but running under linux) then the X11 forwarding does work OK, even from behind the router. This router does Network Address Translation (and is set up to forward port 22 to my laptop, so that I can also log into the laptop at home from my machine at work) So here is a summary: without router: X11 forwarding from home laptop to linux box WORKS X11 forwarding from home laptop to solaris box WORKS with router X11 forwarding from home laptop to linux box WORKS X11 forwarding from home laptop to solaris box FAILS I made a transcript using ssh -vX comparing a connection to the solaris box with and without the router. The transcripts (apart from the dates and the phantom DISPLAY values) are identical. When I try to start an x client (say an xterm or xclock) the window freezes, and I can not use it any more. I have to kill the shell in which I invoked ssh on the laptop. I am enclosing below a transcript of a failed session. I'd be happy to do some additional diagnostic work, but don't know where to go from here, and need guidance. Thanks! Bruce Allen[ballen@dsl-65-187-169-17 /root]$ ssh -vX ballen@dirac.phys.uwm.edu OpenSSH_3.1p1, SSH protocols 1.5/2.0, OpenSSL 0x0090603f debug1: Reading configuration data /etc/ssh/ssh_config debug1: Applying options for * debug1: Rhosts Authentication disabled, originating port will not be trusted. debug1: restore_uid debug1: ssh_connect: getuid 500 geteuid 500 anon 1 debug1: Connecting to dirac.phys.uwm.edu [129.89.57.19] port 22. debug1: temporarily_use_uid: 500/500 (e=500) debug1: restore_uid debug1: temporarily_use_uid: 500/500 (e=500) debug1: restore_uid debug1: Connection established. debug1: identity file /home/ballen/.ssh/identity type -1 debug1: identity file /home/ballen/.ssh/id_rsa type -1 debug1: identity file /home/ballen/.ssh/id_dsa type 2 debug1: Remote protocol version 1.99, remote software version OpenSSH_3.0.2p1 debug1: match: OpenSSH_3.0.2p1 pat OpenSSH* Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_3.1p1 debug1: Credentials Expired debug1: proxy expired: run grid-proxy-init or wgpi first File=/tmp/x509up_u500 Function:proxy_init_cred debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug1: kex: server->client aes128-cbc hmac-md5 none debug1: kex: client->server aes128-cbc hmac-md5 none debug1: SSH2_MSG_KEX_DH_GEX_REQUEST sent debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP debug1: dh_gen_key: priv key bits set: 129/256 debug1: bits set: 1632/3191 debug1: SSH2_MSG_KEX_DH_GEX_INIT sent debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY debug1: Host 'dirac.phys.uwm.edu' is known and matches the RSA host key. debug1: Found key in /home/ballen/.ssh/known_hosts2:14 debug1: bits set: 1629/3191 debug1: ssh_rsa_verify: signature correct debug1: kex_derive_keys debug1: newkeys: mode 1 debug1: SSH2_MSG_NEWKEYS sent debug1: waiting for SSH2_MSG_NEWKEYS debug1: newkeys: mode 0 debug1: SSH2_MSG_NEWKEYS received debug1: done: ssh_kex2. debug1: send SSH2_MSG_SERVICE_REQUEST debug1: service_accept: ssh-userauth debug1: got SSH2_MSG_SERVICE_ACCEPT debug1: authentications that can continue: publickey,password,keyboard-interactive debug1: next auth method to try is publickey debug1: try privkey: /home/ballen/.ssh/identity debug1: try privkey: /home/ballen/.ssh/id_rsa debug1: try pubkey: /home/ballen/.ssh/id_dsa debug1: authentications that can continue: publickey,password,keyboard-interactive debug1: next auth method to try is keyboard-interactive debug1: authentications that can continue: publickey,password,keyboard-interactive debug1: next auth method to try is password ballen@dirac.phys.uwm.edu's password: debug1: packet_send2: adding 64 (len 60 padlen 4 extra_pad 64) debug1: ssh-userauth2 successful: method password debug1: channel 0: new [client-session] debug1: send channel open 0 debug1: Entering interactive session. debug1: ssh_session2_setup: id 0 debug1: channel request 0: pty-req debug1: Requesting X11 forwarding with authentication spoofing. debug1: channel request 0: x11-req debug1: channel request 0: shell debug1: fd 3 setting TCP_NODELAY debug1: channel 0: open confirm rwindow 0 rmax 16384 Last login: Fri Jun 14 00:33:29 2002 from dsl-65-187-169- Sun Microsystems Inc. SunOS 5.8 Generic February 2000 Sun Microsystems Inc. SunOS 5.8 Generic February 2000 You have mail. ballen@dirac> xterm & [1] 1617 ballen@dirac> debug1: client_input_channel_open: ctype x11 rchan 3 win 4096 max 2048 debug1: client_request_x11: request from 129.89.57.19 33305 debug1: fd 7 setting O_NONBLOCK debug1: channel 1: new [x11] debug1: confirm x11 This is where everything hangs. I've also printed out the environment on the machine after I have connected. Here it is: ballen@dirac> env USER=ballen LOGNAME=ballen HOME=/home/ballen PATH=/usr/ccs/bin:/usr/local/Office51/bin:/home/ballen/bin:/usr/openwin/bin:/opt/Acrobat4/bin:/usr/sbin:/usr/local/bin:/usr/dt/bin:/usr/openwin/bin:/opt/dt/bin:/opt/SUNWspro/bin:/opt/SUNWste/bin:/opt/SUNWneo/bin:/opt/SUNWste/bin:/opt/SUNWimap/bin:/opt/SUNWsmsjc/bin:/opt/SUNWicg/bin:/opt/SUNWvts/bin:/opt/SUNWsms/bin:/opt/SUNWcorba/bin:/opt/SUNWsymon/bin:/opt/SUNWrtvc/bin:/usr/local/X11/bin:.:/home/ballen:/bin:/usr/bin:/usr/ucb:/etc:.:/usr/ccs/bin:/usr/ccs/lib:/usr/local/mpi/bin:/usr/lib/lp/postscript:/home/ballen/rvplayer5.0:/opt/hpnp/bin MAIL=/var/mail//ballen SHELL=/bin/tcsh TZ=US/Central SSH_CLIENT=65.187.169.17 64439 22 SSH_TTY=/dev/pts/33 TERM=xterm DISPLAY=dirac:28.0 HOSTTYPE=sun4 VENDOR=sun OSTYPE=solaris MACHTYPE=sparc SHLVL=1 PWD=/home/ballen GROUP=uwmlsc HOST=dirac REMOTEHOST=dsl-65-187-169-17.telocity.com MOZILLA_HOME=/usr/local/netscape EDITOR=/usr/openwin/bin/textedit CVSROOT=/home/cvs/CVS_REPOSITORY/repository_GRASP NNTPSERVER=news.uwm.edu ENSCRIPT=-fTimes-Roman10 TG_HOME=/local/tgraph TG_HOST=dirac.phys.uwm.edu MANPATH=/usr/openwin/man:/opt/SUNWspro/man:/opt/SUNWste/license_tools/man:/usr/share/man:/usr/local/man:/usr/local/mpi/man:/opt/hpnp/man: INFOPATH=/usr/local/info TMPDIR=/tmp/ LD_LIBRARY_PATH=/usr/local/lib:/opt/hpnp/lib PRINTER=hp2200_1
i don't know what this is: debug1: Credentials Expired debug1: proxy expired: run grid-proxy-init or wgpi first File=/tmp/x509up_u500 Function:proxy_init_cred i don't have any guesses now. would like to see sshd -ddd on solaris for the fail case.
Here's an edited version from a previous (emailed) answer to this: Short answer: You probably have an MTU/fragmentation problem. For each network interface on both client and server set the MTU to 576, eg "ifconfig ethX mtu 576". If the problem goes away, read on. Long answer: At each routing hop, IP packets bigger than the outgoing interface's MTU get fragmented. Only the first fragment has TCP port numbers. Firewalls usually drop everything but the first fragment since it can't be matched against the rulebase. Some NAT configuration (eg many-to-one NAT or port address translation) can't match the fragments against their translation state tables. Logging in and using the shell will normally generate relatively small packets, however if you something that generates a lot of data (eg cat'ing a big file or starting an X app, you may generate a packet bigger than the MTU. Let's say it's a 1500 byte IP packet and the router has 2 different MTUs (say 1500 & 1484) and no firewall. When the router goes to forward it, the packet is too big for the interface MTU (1484), so the router breaks it into 2 fragments, 0 and 1. Fragment 0 contains the first 1484 bytes (including the TCP source and dest ports) and fragment 1 contains the remaining 16 bytes. Both fragments are sent on to their destinations. When the first fragment reaches its target, it's held by the IP stack until the remaining fragments arrive, at which time the IP packet is reassembled and passed up the stack to TCP. If all fragments are not received by the timeout, the entire IP packet is discarded and an ICMP "timeout during reassembly" error is sent back. Now add your firewall, which drops fragment 1. Your 1500 byte IP packet times out during reassembly and TCP retries, by sending another 1500 byte packet. Repeat. Eventually, TCP will time out and you'll get a connection termination. IP stack parameters (such as Path MTU Discovery) and external variable (such as the MTUs of all the hops between hosts) can also affect whether or not a given connection will be affected. Maybe I ought to submit this to the FAQ maintainer....
Darren -- you were correct -- it was fragmented packets not getting forwarded by the NAT box. I am closing out the bug report. Details follow. Thanks you! The following command on the Solaris box: ifconfig hme0 mtu 576 solved the problem. Unfortunately this Solaris box has some NFS mounted partitions. These small MTU values really clobber NFS performance so I'll probably need to reset the mtu value each time I want to to X11 forwarding. Sigh. I'll experiment to find the largest acceptable MTU value. I don't know where the packets are getting fragmented -- probably by my DSL provider. And I agree that you should add this to the FAQ -- I read the FAQ closely before posting my bug report so if I had seen your posting in the FAQ it would have saved everyone's time and bandwidth! Thanks again! I still can't believe how well the open-source model works when the developers are committed to their products. Bruce ****************************************** Kevin -- the thing that you didn't recognize is a (failed) certificate-based authentication attempt. This is there because I use some Globus Grid resources which use strictly certificate-based authentication. I don't know if this is part of the standard ssh client or if mine has been linked against some Globus-enhanced libraries. In any case, it's not the source of my problem, which Darren correctly identified.
Mass change of RESOLVED bugs to CLOSED