Bug 799 - scp incorrectly reports "stalled" on slow copies
Summary: scp incorrectly reports "stalled" on slow copies
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: scp (show other bugs)
Version: -current
Hardware: All All
: P4 minor
Assignee: Damien Miller
URL:
Keywords: patch
Depends on:
Blocks: V_4_8
  Show dependency treegraph
 
Reported: 2004-01-29 14:09 AEDT by Peter Jeremy
Modified: 2008-03-31 15:19 AEDT (History)
0 users

See Also:


Attachments
increment counters for short writes (1.75 KB, patch)
2007-05-17 20:08 AEST, Damien Miller
no flags Details | Diff
Revised scp diff (4.93 KB, patch)
2007-06-12 16:37 AEST, Damien Miller
dtucker: ok+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Jeremy 2004-01-29 14:09:15 AEDT
The following description relates to OpenSSH 3.5p1 as embedded in
FreeBSD/i386 4.9p1.  The overall behaviour (incorrectly reporting
"stalled") is identical on OpenSSH 3.7.1p2 and a quick check shows
that the scp blocksize and ssh hysteresis behaviour have not changed.
I believe the behaviour is not platform or OS dependent.

By default, scp(1) will provide a progress meter showing the
transfer ETA.  If the link is slow, the transfer meter will
alternate between displaying "- stalled -" and unrealistically
short ETAs even though the actual connection is transferring
data smoothly (as shown by tcpdump).

By default, the progress meter is updated every second.  If
there has been no apparent progress in the transfer after 5
seconds, the progress meter will report "stalled" until some
progress is reported.  There appear to be two issues that
will result in long delays between output progress being seen
by the progress meter.

Firstly, output from the scp process is in filesystem blocksize
blocks - the number of bytes transferred (used by the progres
meter) will only be incremented when a full block of data has
been transferred.  Therefore if the transfer rate is less than
1.6KB/sec (old 8K filesystem) or 3.2KB/sec (newer 16KB filesystem)
then the link will report as "stalled".  (Identified by code
inspection).

Secondly, the ssh process spawned by the scp process to perform
the actual encryption and transfer includes a substantial
internal buffer (>64KB) and appears to implement hysteresis.
ktrace output of a sample transfer shows a peak of over 96KB
buffered - at which point the ssh process stops reading until
the buffer drops to about 32KB.  This implies that there is
approximately 64KB hysteresis and a transfer rate below about
13KB/sec can result in "stalled" reports.

This behaviour is a regression from from an earlier version of
OpenSSH but I have not tracked down when it occurred.


To reproduce:
With FreeBSD, it is possible to use ipfw/dummynet to artificially
reduce the outgoing ssh bandwidth to a second system.  If traffic
shaping is not necessary, it will be necessary to create a low
speed link - eg ppp over an analogue modem or serial link.

I used the following commands:
# ipfw pipe 20 config queue 10 bw 80000
# ipfw add 1005 pipe 20 tcp from any to 192.168.164.18 22
# dd if=/dev/urandom of=data count=512
# scp data 192.168.164.18:/tmp

The appropriate fix is unclear - the buffering in both scp and ssh as
well as the hysteresis in ssh are beneficial to maximize transfer
bandwidth and minimise context switching.  It would not be desirable
to reduce these sizes when ssh is used across a LAN.

In the case of scp, changing from atomicio(write, ...) to write(...)
would remove the requirement to write at least filesystem_blocksize
bytes/sec to the remote system.

In the case of ssh, the hysteresis needs to be adjusted based on the
outgoing bandwidth - this could possibly be done by resetting the
"don't read more" flag after (say) 1 second.
Comment 1 Damien Miller 2007-05-17 20:08:42 AEST
Created attachment 1286 [details]
increment counters for short writes

This patch avoids the use of atomicio on the network sockets, so it should have the chance to update the counters more frequently.
Comment 2 Damien Miller 2007-06-12 16:37:09 AEST
Created attachment 1302 [details]
Revised scp diff

There is no point incrementing the counters more quickly when the file descriptors are blocking. This diff makes them non-blocking, unifies a little code and increases the buffer size for improved performance.
Comment 3 Damien Miller 2007-10-24 13:46:17 AEST
patch applied - will be in openssh-4.8.
Comment 4 Damien Miller 2008-03-31 15:19:59 AEDT
Fix shipped in 4.9/4.9p1 release.