We sometimes see SCP exit while leaving a hung SSH behind. SSH is left stuck in a poll()/select() waiting for the SSH connection socket to be readable. Nothing in the -v -v -v output is untoward. The server-side sshd -d -d -d output is the same whether the client hangs or not. In either case the client and server both close and free the last open channel (the session channel). See the "scp completes but ssh subprocess in deadlock with sshd" thread on the openssh-unix-dev post list. I will attach a tar file containing ssh -vvv and ssh -ddd output, lsof output, etc... for good scps and hanging sshs. Note that a select() wrapper was LD_PRELOADed into ssh that prints the list of file descriptors being selected for in every call to select(); source will be attached. This bug appears to be a race condition in the client. Versions of OpenSSH affected apparently include 2.9p2, 3.0.2p1 and 3.1p1. See these openssh-unix-dev posts: http://marc.theaimsgroup.com/?l=openssh-unix-dev&m=101588612615615&w=2 http://marc.theaimsgroup.com/?l=openssh-unix-dev&m=101596073221780&w=2
Created attachment 40 [details] Debug output, lsof output, etc...
Aha! Yes, there is a race. It's there in 2.9p2, but apparently not in 3.0.2p1. Essentially the "if (compat20 && session_closed && !channel_still_open())" check at the top of the client loop is not close enough to the call to select() in client_wait_until_can_do_something(). In fact, client_wait_until_can_do_something() calls channel_prepare_select() which calls channel_handler() which may well call chan_is_dead() which may leave no channels open and yet client_wait_until_can_do_something() will still go into the select().
Mass change of RESOLVED bugs to CLOSED