Created attachment 1345 [details] proposed patch ---PROBLEM DESCRIPTION--- I use ssh as a SOCKS 5 proxy for Firefox, and I have configured firefox to perform remote DNS lookups. That is, the SOCKS request contains the hostname rather than the IP address of the host I want to connect to. For the vast majority of sites I connect to, this works great. However, for a few hosts, including www.etrade.com and www.vanguard.com, the connection hangs for several seconds, then times out. Although I think it's irrelevant, my SSH client is OpenSSH 4.6p1 on MacOS 10.4. My server is OpenSSH 4.6p1 on Linux 2.6.12.5. ---INVESTIGATION--- I ran strace on the sshd and saw that the DNS lookup of www.vanguard.com was hanging (the DNS server took a long time to respond, much more than 5 seconds). I decoded the DNS request and saw that it is requesting QTYPE 28, which is the DNS AAAA record. This is the request for the IPv6 address. Next I tried this DNS lookup with dig. I ran "dig -t aaaa www.vanguard.com", and it hung for about 20 seconds before finally returning. I ran "dig -t aaaa www.yahoo.com", and it returned immediately. I ran these same dig tests on a different machine, serviced by a different ISP and DNS servers, and got the same results. My conclusion is that an AAAA lookup on some hosts will hang for a long time. Next I downloaded portable OpenSSH, compiled my own sshd, and found the function connect_to() in channels.c. Note that the call to getaddrinfo() is passing in a hints structure consisting of ai_family=IPv4or6 and ai_socktype=SOCK_STREAM. The hints parameter is optional, and if it is not specified it still allows either IPv4 or IPv6 results. I replaced hints with NULL and recompiled. My problem went away. ---RECOMMENDATION--- I recommend that the hints parameter be omitted, as this seems to fix the hanging behavior while still working correctly on all sites I try to connect to.
Could you try setting "AddressFamily inet" in your /etc/ssh/sshd_config instead? The fix is not correct and will, among other things, break the AddressFamily option.
Yes, that fixed it. *sigh* Would you not agree that "AddressFamily=any" is still broken in the common case (where IPv6 is not used)? It should not hang like it does.
(In reply to comment #2) > Yes, that fixed it. *sigh* > > Would you not agree that "AddressFamily=any" is still broken in the > common case (where IPv6 is not used)? It should not hang like it does. I think the brokenness is in the DNS infrastructure in question. Quoth RFC4074 (ftp://ftp.rfc-editor.org/in-notes/rfc4074.txt): "4. Problematic Behaviors There are some known cases at authoritative servers that do not conform to the expected behavior. This section describes those problematic cases. 4.1. Ignore Queries for AAAA Some authoritative servers seem to ignore queries for an AAAA RR, causing a delay at the stub resolver to fall back to a query for an A RR. This behavior may cause a fatal timeout at the resolver or at the application that calls the resolver. Even if the resolver eventually falls back, the result can be an unacceptable delay for the application user, especially with interactive applications like web browsing."
Your platform's resolver should "do the right thing" when AddressFamily=any is in use, as this sets hints.ai_family to be AF_UNSPEC which should be equivalent to not setting a hints (for the common case at least). Perhaps your resolver is defaulting to IPv4-only lookups when no hints is specified, but doing IPv6-then-IPv4 when hints.ai_family==AF_UNSPEC. IMO this would be rather silly behaviour, but I have seen libc authors do dumber things... IIRC we need to fill out hints for either SRV RR or SCTP support, but my memory is hazy.
Wow, great research on the AAAA problem. So it's pretty clear that the DNS server is misbehaving. My getaddrinfo implementation (glibc 2.5 I think) is also doing something I don't understand, which may be wrong. Maybe the behavior has changed in later versions of getaddrinfo(). I'll try to explore that more. Does this warrant a note in the FAQ or other documentation? Maybe it's rare problem. "If you use SOCKS proxying with remote DNS lookup and connections to some hosts timeout, and you don't need IPv6 support, try setting 'AddressFamily inet' in your sshd config." Unfortunately many users won't have access to their server's sshd config. Would changing it on the client side would work?
Cleaning up some old bugs: I don't think there's anything else we should do here. IPv6 has gotten better since the original report and I don't think it's OpenSSH's job to document all of its possible failure modes.
Closing all resolved bug with release of openssh-8.2