I have run into this on a couple of my systems during backups over ssh, I am tarring the full server directory structure and piping it over ssh and redirecting it tar file on our backups server. this does not happen on every server.. however the servers that it DOES happen to.. it is repeatable and happens EVERY time.. on double checking the architectures.. it would appear that the 2 servers that I KNOW for sure do this.. are dual AMD's.. the backup server is a p4..
Actually one is a dual p3.. sorry..
You don't happen to have a LinkSys router between client and server, do you? If so see bug #510.
Nope.. no linksys router.. These are internet webservers.. they are connected to switches of 3Com and/or Cisco..
So the problem only occurs with a dual-cpu client? What linux distribution and kernel version? Can you run openssh's regression tests ("make tests")? It would also be interesting to know if you can reproduce this with an openssh *and* openssl compiled without optimization.
Please provide some detail on your OS platform and OpenSSL version. Did you compile OpenSSL yourself? If you used a vendor OpenSSL, is it optimised for a particular CPU architecture? These errors are usually OpenSSL issues.
Also check "netstat -s" for packets with bad IP/TCP checksums. IP and TCPs checksums are pretty short, so bad packets can occasionally make it through to the application layer (where it will be detected by the MAC).
self compiled.. openssh3.7.1p1 and openssl 0.9.6b.. redhat 7.3 on the most recently found server with this error.. it is using the redhat kernel 2.4.18- 3smp.. openssl i remember installing a later version.. and same with the kernel.. however this is how it is currently situated.. (possibly forgot to remove the fallback in lilo or something) let me recompile a few things now that i see that i have more to do before submitting this report.. my apologies.. I will let you know what happens after i recompile everything up to date..
Created attachment 509 [details] make tests results.. This is the tests results after relinking openssh-3.7.1p1 with openssl-0.9.7b (still gives the Corrupted MAC on input error)
Well, the output shows that the regression tests pass. Did "netstat -s" show any errors? Can you replicate the error using just the loopback? eg # ssh root@127.0.0.1 "tar cf - /" >/dev/null or # ssh -o Compression=no root@127.0.0.1 "dd if=/dev/zero bs=1k count=1m" >/dev/null And it's definitely only occurs on SMP boxes? Also note that 3.7.1p1 has some security issues WRT PAM, if you're using PAM you should upgrade: http://www.openssh.com/txt/sshpam.adv
Created attachment 510 [details] netstat -s output netstat output.. still getting the same error after kernel upgrade to 2.4.23
Just an update.. i think i have tracked it down to actually being on the backup (receiving) server.. running the command piped thru locally gives this error on the backup server.. but not on the clients.. it is running 3.7.1p1 (no we dont use PAM so we didn't do the p2 upgrade.) linked against openssl 0.9.6b (it would appear the redhat puts ssl in a totally different place than the compiled version and configure finds the old version.. meaning that i am going to have to recompile every server to tell it to use the newer version.. but thats another issue) running compiled kernel 2.4.20 on a p4 I will relink the openssh against the newer openssl and let you guys know the results..
ok.. close this report.. i am going to chock this up to bad ram in the server.. recompiling openssh and the kernel did all sorts of wierd things that clearly points to memory problems on the server.. thanks for your help, and my apologies for the false report.. :( James
Thanks - we'll add "bad ram" to the list of things that can cause this.
Mass change of RESOLVED bugs to CLOSED