Bug 1517 - ssh ControlMaster process is crashing frequently when multiplexing ssh and scp connections with error 'select: Invalid argument'
Summary: ssh ControlMaster process is crashing frequently when multiplexing ssh and sc...
Status: CLOSED WORKSFORME
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: ssh (show other bugs)
Version: 5.1p1
Hardware: SPARC Solaris
: P2 normal
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-30 03:38 AEST by James Morrish
Modified: 2011-01-24 12:34 AEDT (History)
1 user (show)

See Also:


Attachments
ssh Debug Level 9 from Multiplex Control Master (491.03 KB, application/x-gzip)
2008-08-30 03:47 AEST, James Morrish
no flags Details
debug wrapper for select() (3.03 KB, patch)
2008-12-08 10:55 AEDT, Damien Miller
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description James Morrish 2008-08-30 03:38:40 AEST
ssh ControlMaster process is crashing frequently when multiplexing ssh and scp connections with error 'select: Invalid argument'.

We have an application which is constantly making two to three hundred concurrent multiplexed scp and ssh connections between two Solaris platforms.  The scripts are transferring 3KB files and then performing checkum and mv commands.  We are using ssh multiplexing to reduce CPU and network load.  We have a script in cron to restart the ControlMaster process; it is restarting approx 280 times per day. 

The ssh ControlMaster syntax:
/usr/local/bin/ssh -oControlMaster=yes -oControlPath="/cdrc/.ssh/mp_sock-scholzite" -fnN -c blowfish-cbc transfer@scholzite-be

We are using Openssh-5.1p1 compiled from source (and Openssl-0.9.8h).  This OpenSSH version is used on the client and server, both servers are Solaris 9 Sparc platforms.  The max file descriptors kernel parameter on both systems is 8192.

The following parameters are set in the sshd_config (these values are overkill, but we were clutching at straws!), if we set values below 300 for MaxSessions we get 'Administratively Prohibited' authentication failures on the server side. 
MaxSessions 8192
MaxStartups 8192

I have attached level 9 debug.  The debug always ends with 'select: Invalid argument' when the process fails.

Last few lines of debug:
debug3: channel 296: close_fds r 1023 w 1024 e 1025 c 1022
debug3: fd 0 is not O_NONBLOCK
debug3: fd 1 is not O_NONBLOCK
debug3: fd 2 is not O_NONBLOCK
select: Invalid argument
Transferred: sent 12310448, received 106560 bytes, in 88.1 seconds
Bytes per second: sent 139799.0, received 1210.1
debug1: Exit status -1
Comment 1 James Morrish 2008-08-30 03:47:08 AEST
Created attachment 1564 [details]
ssh Debug Level 9 from Multiplex Control Master
Comment 2 Damien Miller 2008-12-08 10:55:28 AEDT
Created attachment 1587 [details]
debug wrapper for select()

You will run into out-of-file-descriptor conditions with MaxSessions=8192 and a fd limit of 8192 - each session may consume (in the worst case) five file descriptors.

That being said, it should not crash the client. Unfortunately, debugging these things is tricky. Could you see if you can catch the crash with the attached diff applied? It is a debugging wrapper for select() that logs the arguments and should catch invalid fds that are being passed.

be warned: it will cause a lot of log spam, especially since you have so many open connections.
Comment 3 Damien Miller 2010-04-23 11:45:58 AEST
1 year with no followup + rewrite of mux code in the meantime => bug closed
Comment 4 Damien Miller 2011-01-24 12:34:04 AEDT
Move resolved bugs to CLOSED after 5.7 release