Bug 3231 - Deadlock/hang while using tunnel-device with high rates
Summary: Deadlock/hang while using tunnel-device with high rates
Status: NEW
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: ssh (show other bugs)
Version: 8.4p1
Hardware: Other Linux
: P5 major
Assignee: Assigned to nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-15 12:26 AEDT by Michael Traxler
Modified: 2020-11-15 21:49 AEDT (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Traxler 2020-11-15 12:26:15 AEDT
Hi,

I use ssh tun-devices for a VPN. When transferring data rates >~20MBytes/s the tunnel-device stops working after a short time (seconds).
If the "ssh -w 10:10" is used and the interactive channel is also open, this channel is also completely blocked.
"~?" and "~." still works.  

This happens directly if I use my WiFi-connection. I could not reproduce it via the LAN connection to the internet.
This happens with many combinations of versions of ssh and sshd (8.3, 8.4, 7.9).

I can nearly completely workaround these deadlocks if I change 
the source in "channels.h" to the following:
#define CHAN_SES_PACKET_DEFAULT (8*1024)                                                                                                                               
#define CHAN_SES_WINDOW_DEFAULT (16*CHAN_SES_PACKET_DEFAULT)          
so, making these buffers much smaller.
The disadvantage of this hack is, that then the maximum speed for the ssh-connection is limited to around ~8MBytes/s.

This can be verified the following way:

ip l del dev tun20
ssh root@remote.de "ip l del tun20"
ssh -v -t -w   20:20 -oPermitLocalCommand=true  -oLocalCommand=/root/bin/execute_after_ssh_tunnel_start.sh root@remote.de "ip a add 192.168.20.1/24 peer 192.168.10.1 dev tun20; ip l set dev tun20 up; bash "

where /root/bin/execute_after_ssh_tunnel_start.sh is:
ip link set tun20 up
ip a add 192.168.10.1/32 peer 192.168.20.1 dev tun20

and in a second shell:
ssh root@192.168.20.1 "dd if=/dev/zero bs=1M" > /dev/null

This will block very fast with the default values of the Packet and Windows sizes.
Changing the MTU to smaller values on the tun20 will not prevent this deadlock.
Seems to be a deeply hidden bug in the code...

Thanks a lot,

Michael
Comment 1 Michael Traxler 2020-11-15 21:49:56 AEDT
I just checked some more things.
It is now clear that the error is triggered by the linux kernel version.
There are no deadlocks of the ssh-tunnel with 
5.8.15
But the deadlocks occur with 5.9.1 and vmlinuz-5.9.8.
Without understanding I speculate:
The new kernels skip some TCP-packets and this causes deadlocks in openssh tunnels. No other application on the system with the new kernel seems to suffer...