When copying small files (html pages) to web servers, we occasionally get a failure message: Received disconnect from xx.xx.xx.xx: 2: Corrupted MAC on input. (xx.xx.xx.xx is the IP address of one of the web servers) We copy 6 files to 2 machines every hour. The failure happens 2 or 3 times a day. It's a different file each time, and occurs on either target machine. We have this failure using scp, sftp, and streaming tar file through stdin to ssh. The source machine is AIX 5.1 ML05 using openssh 3.8p1, corporate internal DNS. The target machines are SunOS 5.8, kernel 108528-18, They are outside our firewall, using our ISP's DNS. The source machine is not in the ISP's DNS. It is in the /etc/hosts file. X11Forwarding is turned off in sshd_config on the target machines. The firewall's NICs are Sun qfe cards. The network switches are Cisco 6500
Created attachment 605 [details] Webserver (target) sshd_config sshd_config from target machine (where corrupt MAC error occurs)
Created attachment 606 [details] Source machine ssh_config ssh_config from source machine (receives disconnect message from target webserver).
This is usually a problem on the network between client and server, but has also been reported to be caused by bad RAM in either client or server. The fact that it's not consistent makes it unlikely to be a software problem. How big are the files, and what kind of network gear do you have between client and server? Also see bug #510.
The source machine is an IBM P650 with standard IBM ethernet controller. The firewall and web servers are Sun with Sun qfe cards. The switches are all Cisco 6500 I NEVER get this error going through the same infrastructure to machines in the DMZ's. It ONLY happens going to the external web servers.
The 6 files in question range from 70 KB to 170 KB. I've also tested this as one file of about 500 KB.
Which cipher are you using? Does selecting a different cipher make any difference? Please attach (ie use "create a new attachment") a complete debug trace ("scp -vvv [options]") of a failed session.
*** Bug 860 has been marked as a duplicate of this bug. ***
My three haporth worth ... this appears to be a common problem, having searched the internet for this error message. I was also receiving this error, and believe I have fixed it ... The WinXP laptop I am using to connect to the Linux file server using X Window has two addresses (and so two DNS names), one for cable and the second for wireless. If the machine name does not correspond to the DNS name of the network interface, I get this error. If I change the machine name so everything matches the error doesn't occur. Repeatedly changing the machine name is a big pain ... Using either OpenSSH 3.9p1 under Cygwin or PuTTY 0.55 to connect to OpenSSH 3.5P1 on SuSE Linux 8.2. Same error with both. If you tell me what logs and configs you want, I can send them ...
My apologies ... spoke too soon. Not getting bad MAC now, but instead getting Disconnecting: Bad packet length 3428026913. Same result.
rapier at tyranny com about reports tracking down another cause of these errors: "Anyway, the problem we found with the corrupted mac on input was a result of what seems to be a hardware bug in intel e1000 drivers. we since switched to sysconnect cards and the problems went away (only to be replaced by another one thats caused by a memory resources starvation issue when the system is under high IO loads). Basically, the HMAC stuff seems pretty rock solid so if people see this sort of error consistantly they should probably look at the drivers, hardware, or cabling."
I apologize if this is a silly question, but why is the connection killed when this happens instead of the affected message being retransmitted? IIRC when a TCP checksum fails the TCP stack will retry sending that packet. As long as SSH has, for all intents and purposes, a better checksum (the MAC), why doesn’t it do the same when for some reason the TCP one fails? It doesn’t seem that it would be that hard, since it doesn’t have to do everything else TCP does, only retry packets with failed MACs. If I understand correctly the situation, the main source of these bugs are bad network stacks—in my case, I suspect the impossible-to-disable rx/tx checksum offloading function of my network adapter is to blame— but this can happen, albeit rarely, even when the entire TCP stack functions as designed: the TCP checksum can fail to detect a transmission error, and on a noisy transmission medium it can happen often enough. As far as I know occasionally corrupted packages are considered “normal” in TCP, not grounds for terminating the connection.
[More details for posterity] For what it's worth, a few months back I found myself dealing with this situation in a couple of variants. In one case, one end of the SSH session was to a VM in a Xen environment. In another case, one end of the SSH session was to a VM in a VMWare ESXi environment. Copying anything via scp or sfp was almost impossible, although interactive shells usually worked. In both cases, after lots of diagnosis and "google research" I was able to determine that the underlying cause seemed to be a faulty TCP segment offload mechanism in the underlying virtualized network layer. (In one case, fingers were pointed at a virtual switch, in the other to the virtual NIC.) Either way, it appears that the VM's kernel was offloading checksumming to the lower layers, but none of the lower layers actually bothered to do it. Disabling TCP segment offload in the upper level of the network stack (that of the VM OS) solved the problem and the systems have been fine since then. This *does* tend to indicate that it's not an SSH problem per se.
*** Bug 2941 has been marked as a duplicate of this bug. ***
For the record, another known cause of this is buggy network device firmware. Historically Linksys devices are known to have issues in some versions of their firmware, see bug#510.
I have this problem all the time (Bug 2941) when running scp on my Ubuntu 16.04 box at home, to copy a multi-gigabyte file from a Linode server. (And never in the other direction, from home to Linode.) I can transfer exactly the same files from other hosts to my home computer (e.g., from a Macintosh in my home), so it seems the problem is not on my home Ubuntu box, nor in my home network setup. I guess it's on Linode?
The error I get is ssh_dispatch_run_fatal: Connection to 11.22.33.44 port 33333: message authentication code incorrect.
(In reply to Dan from comment #15) [...] > I can transfer exactly the same files from other hosts to my home > computer (e.g., from a Macintosh in my home), so it seems the > problem is not on my home Ubuntu box, nor in my home network setup. It's not clear from the description but unless one of those tests traverses from inside to outside your home network it could still be your home network. > I guess it's on Linode? Maybe. I'd suggest testing a large transfer (of something that you would not mind disclosing, since it'll be in clear text) with something like netcat then sha256 source and destination files and see if they match. That'll eliminate the variables of ssh and libcrypto.
OK, I ran a bunch of tests on my home network, which consists of an ActionTec MI424WR router (for Verizon FIOS) connected to a Cisco 24-port switch (Procure 1400-24G), wired by CAT-6 to the various computers. I ran scp on my Linux box (Ubuntu 16.04), trying to copy huge files from another host. It works when the remote host is another computer on my home network (a Mac). So the problem can't be the Cisco switch or the Linux computer. It fails when the remote host is on the internet. I tried two different internet Linux hosts (one Linode, one HostDime). Both scp operations died partway through the transfer. The Linux box said: ssh_dispatch_run_fatal: Connection to xx.xx.xx.xx port xxx: message authentication code incorrect and the Mac said: Corrupted MAC on input. Disconnecting: Packet corrupt So the problem is either the ActionTec router, the Verizon hookup in my basement, or something outside my home. Wheeee.
Are you still have a hard time? its working for me now Castro B, http://internetvergelijken.nl
This ticket can be closed. I upgraded to a newer router and the problem disappeared.
Another thing found to cause this in at least one case: ssh's hardware-accelerated GCM (aes128-gcm@openssh.com) running on the same CPU (a Xeon E5-2620 v4) as a hardware-accelerated sha1sum process. Switching the ssh session to chacha20-poly1305@openssh.com worked around the problem (presumably since it avoids HW acceleration entirely). The error message has change due to some refactoring ("ssh_dispatch_run_fatal: Connection to [...] port 22: message authentication code incorrect") but it means the same thing.
Do you know what platform that was? That smells like a kernel bug, failing to save/restore SSE registers.
(In reply to Damien Miller from comment #22) > Do you know what platform that was? That smells like a kernel bug, > failing to save/restore SSE registers. It's a Linux of some flavour, but it's one of many similar machines and it's the only one affected, and it seems at least somewhat sensitive to which cpu core the sha1sum schedules on, so my bet would be a faulty CPU.
closing resolved bugs as of 8.6p1 release