scp of a 25Mb file over a hippi interface takes 159s with protocol 2 and 22 seconds with protocol 1. Over fast ethernet it takes 20s with protocal 2 and 22s with protocol 1. Snooping the hippi for protocol 1 has packets like 16:33:54.465436 hartree-hippi0 -> hodgkin TCP D=22 S=57066 Ack=851535706 Seq=1272028889 Len=61440 Win=40767 16:33:54.550683 hodgkin -> hartree-hippi0 TCP D=57066 S=22 Ack=1272090329 Seq=851535706 Len=0 Win=19188 ~62k packet size and for protocol 2 16:33:19.024288 hodgkin -> hartree-hippi0 TCP D=57060 S=22 Ack=2852340565 Seq=845515278 Len=0 Win=61440 16:33:19.040910 hodgkin -> hartree-hippi0 TCP D=57060 S=22 Ack=2852356997 Seq=845515278 Len=48 Win=61440 16:33:19.047481 hartree-hippi0 -> hodgkin TCP D=22 S=57060 Ack=845515326 Seq=2852356997 Len=16432 Win=40767 ~16k packet size.
could you please try this without scp? e.g. cat file | ssh -1 -c 3des 'cat > f' cat file | ssh -2 -c 3des-cbc 'cat > f2' thanks.
time `cat lapack.ibm.tar.gz | ssh -1 -c 3des hodgkin 'cat > f' ` 10.6u 0.6s 0:24 45% 954+790k 0+0io 0pf+0w time `cat lapack.ibm.tar.gz | ssh -2 -c 3des-cbc hodgkin 'cat > f2' ` 8.9u 0.7s 2:40 6% 929+631k 0+0io 0pf+0w My colleague has done some investigating and has found something up with select. "I have added some diagnostics to packet.c and clientloop.c. It is clear that the slow select calls are not working properly - in particular, they are returning after about 0.2 seconds WITHOUT having set a descriptor. Subsequent calls work. It seems to be input from the connexion that is the problem. I suspect a failure to communicate from the HiPPI driver, which then triggers a timeout."
This is a problem with the nagle algoithm and a delayed ack timer http://www.rs6000.ibm.com/support/sp/perf/nagle21.html describes the problem over an IBM switch which again is a network with a large MTU. (The same problem accors using scp over this type of network.) The best solution to this is to be able to have larger packets for networks that can support them.
Changing channels.h #define CHAN_SES_WINDOW_DEFAULT (32*1024) #define CHAN_TCP_WINDOW_DEFAULT (32*1024) to #define CHAN_SES_WINDOW_DEFAULT (256*1024) #define CHAN_TCP_WINDOW_DEFAULT (256*1024) Fixes the buffer problem. Scp is still 8 times slower than rcp. The time isn't used in CPU so there is still scope for improvement.
time `cat lapack.ibm.tar.gz | ssh -1 -c 3des hodgkin 'cat > f' ` 9.2u 0.7s 0:22 44% 887+761k 0+0io 0pf+0w time `cat lapack.ibm.tar.gz | ssh -2 -c 3des-cbc hodgkin 'cat > f2' ` 8.7u 0.7s 0:22 41% 888+630k 0+0io 0pf+0w time `cat lapack.ibm.tar.gz | rsh hodgkin 'cat > f2' ` 0.0u 0.0s 0:01 2% 77+214k 0+0io 0pf+0w
hm, i think #define CHAN_SES_WINDOW_DEFAULT (256*1024) #define CHAN_TCP_WINDOW_DEFAULT (256*1024) generates packets > 32k, but i have to cross check. does 64*1024 help. what about using a faster cipher in your tests? :) e.g blowfish?
time `cat lapack.ibm.tar.gz | ssh -2 -c blowfish-cbc hodgkin 'cat > f2' ` 2.6u 0.6s 0:06 46% 736+532k 0+0io 0pf+0w Yep much faster but still not more than half the time in the cpu. Yep packet size is a function of these values #define CHAN_SES_PACKET_DEFAULT (CHAN_SES_WINDOW_DEFAULT/2) #define CHAN_TCP_PACKET_DEFAULT (CHAN_TCP_WINDOW_DEFAULT/2) The networks I saw the problem on hippi and IBM sp switch both have a MTU of 64k so I wanted these values to be atleast 128 and went for 256 to be sure.
with 64 giving 32k packet time `cat lapack.ibm.tar.gz | ssh -2 -c 3des-cbc -p 1025 hodgkin 'cat > f2' ` 8.9u 0.7s 1:47 8% 887+629k 0+0io 0pf+0w with 128 giving 64k packet time `cat lapack.ibm.tar.gz | ssh -2 -c 3des-cbc -p 1025 hodgkin 'cat > f2' ` 9.0u 0.6s 0:23 41% 895+633k 0+0io 0pf+0w
hm, ok, lets try this. keep CHAN_SES_PACKET_DEFAULT fixed: #define CHAN_SES_PACKET_DEFAULT (16*1024) and change the _window_ size to #define CHAN_SES_WINDOW_DEFAULT (CHAN_SES_PACKET_DEFAULT*4) you can try to increase the 4. this means the ssh client will send 4 packets before waiting for an ACK from the server.
Created attachment 23 [details] like this
time `cat lapack.ibm.tar.gz | local/bin/ssh -2 -c 3des-cbc -p 10222 hodgkin 'cat > f2' ` 8.8u 0.8s 0:23 41% 671+630k 0+0io 137pf+0w with these changes.
so, this helps, too? what happens if you #define CHAN_SES_WINDOW_DEFAULT (CHAN_SES_PACKET_DEFAULT*20)
hartree_a [4] time `cat lapack.ibm.tar.gz | local/bin/ssh -2 -c 3des-cbc -p 1222 hodgkin 'cat > f2' ` 8.8u 0.6s 0:23 40% 672+633k 0+0io 140pf+0w No affect going from 4 to 20. Basically anthing that increases the window default above 32 helps.
what about this? Index: channels.c =================================================================== RCS file: /cvs/openssh_cvs/channels.c,v retrieving revision 1.138 diff -u -r1.138 channels.c --- channels.c 8 Feb 2002 11:07:17 -0000 1.138 +++ channels.c 17 Feb 2002 21:34:48 -0000 @@ -1227,7 +1227,7 @@ static int channel_handle_rfd(Channel *c, fd_set * readset, fd_set * writeset) { - char buf[16*1024]; + char buf[64*1024]; int len; if (c->rfd != -1 &&
time `cat lapack.ibm.tar.gz | local/bin/ssh -2 -c 3des-cbc -p 10222 hodgkin 'cat > f2' ` 8.8u 0.7s 0:29 33% 681+701k 0+0io 139pf+0w That didn't seem to help. I checked for reproducability.
there should be no difference between protocol 1 and 2 now.
Mass change of RESOLVED bugs to CLOSED