Bug 3231

Summary: Deadlock/hang while using tunnel-device with high rates
Product: Portable OpenSSH Reporter: Michael Traxler <M.Traxler>
Component: sshAssignee: Assigned to nobody <unassigned-bugs>
Status: NEW ---    
Severity: major    
Priority: P5    
Version: 8.4p1   
Hardware: Other   
OS: Linux   

Description Michael Traxler 2020-11-15 12:26:15 AEDT
Hi,

I use ssh tun-devices for a VPN. When transferring data rates >~20MBytes/s the tunnel-device stops working after a short time (seconds).
If the "ssh -w 10:10" is used and the interactive channel is also open, this channel is also completely blocked.
"~?" and "~." still works.  

This happens directly if I use my WiFi-connection. I could not reproduce it via the LAN connection to the internet.
This happens with many combinations of versions of ssh and sshd (8.3, 8.4, 7.9).

I can nearly completely workaround these deadlocks if I change 
the source in "channels.h" to the following:
#define CHAN_SES_PACKET_DEFAULT (8*1024)                                                                                                                               
#define CHAN_SES_WINDOW_DEFAULT (16*CHAN_SES_PACKET_DEFAULT)          
so, making these buffers much smaller.
The disadvantage of this hack is, that then the maximum speed for the ssh-connection is limited to around ~8MBytes/s.

This can be verified the following way:

ip l del dev tun20
ssh root@remote.de "ip l del tun20"
ssh -v -t -w   20:20 -oPermitLocalCommand=true  -oLocalCommand=/root/bin/execute_after_ssh_tunnel_start.sh root@remote.de "ip a add 192.168.20.1/24 peer 192.168.10.1 dev tun20; ip l set dev tun20 up; bash "

where /root/bin/execute_after_ssh_tunnel_start.sh is:
ip link set tun20 up
ip a add 192.168.10.1/32 peer 192.168.20.1 dev tun20

and in a second shell:
ssh root@192.168.20.1 "dd if=/dev/zero bs=1M" > /dev/null

This will block very fast with the default values of the Packet and Windows sizes.
Changing the MTU to smaller values on the tun20 will not prevent this deadlock.
Seems to be a deeply hidden bug in the code...

Thanks a lot,

Michael
Comment 1 Michael Traxler 2020-11-15 21:49:56 AEDT
I just checked some more things.
It is now clear that the error is triggered by the linux kernel version.
There are no deadlocks of the ssh-tunnel with 
5.8.15
But the deadlocks occur with 5.9.1 and vmlinuz-5.9.8.
Without understanding I speculate:
The new kernels skip some TCP-packets and this causes deadlocks in openssh tunnels. No other application on the system with the new kernel seems to suffer...