Created attachment 1388 [details] Patch to close client_fd I've currently looking at increasing the MAX_SESSIONS to increase the number of slave ssh processes to be multiplex where I ran into the default maximum filehandle limit on a test machine (Solaris 8). I've found a similiar set of patches in the cvs repos that is similiar to a dirty patch that I came up with; so I've been implementing the patch in the repos. The patch includes clientloop.c@1.182, monitor_fdpass.h@1.4 and monitor_fdpass.c@1.13. The problem I've hit is that in the cleanup-code for a failed mm_receive_fd() in the client_process_control() function, the client_fd filehandle is left open and lost. The effect is that the slave ssh process blocks and never returns even if filehandles are freed due to other slave processes closing. I've attached a patch that I think fixes this problem. I've also created a simple regression, but I'm not exactly sure how well it will work in other locations. But to manually test issue: in one window/session: ( ulimit -Sn 11 ; exec ./ssh -vMS /tmp/cntl otherhost ) in another window/session: ./ssh -vS /tmp/cntl otherhost The process in the 2nd window blocks until the master ssh process exits. I would think it would be better to have the slave exit as soon as possible since it will never be able to access otherhost. I've also seen another interesting effect of this process, I've been testing on Solaris 8 and SLES 10 machines currently, and it only seems to effect the Solaris machine - if the filehandle limit is hit from the accept() call in the same client_process_control() function, it blocks the slave ssh session until the another slave ends, freeing some filehandles. I seem to be able to manually reproduce this by changing the previous ulimit value to 15 and running a third process in the same way as the 2nd. The 3rd process will block, but once the 2nd process exist, the 3rd would be let in. I couldn't seem to reproduce on a Linux machine; and I think this is the "right" thing to do anyway.
Created attachment 1389 [details] A quick regression to test bug
Comment on attachment 1389 [details] A quick regression to test bug Regression failed on a different machine.
Created attachment 1398 [details] Allows a multiplex slave to exit and generate a true exit value
Created attachment 1399 [details] Cleanup duplicate diff hunks. Found some more equivalent changes in CVS repos. Removed the duplicate diff hunks.
Patch applied - thanks!
Fix shipped in 4.9/4.9p1 release.