I get intermirttent core dumps after installing/deploying 4.2p1. 4.1p1 was (and is) still working fine. Here's a backtrace in gdb: (gdb) bt #0 0x45a94 in mkstemp64 () #1 0x801c4 in mkstemp64 () #2 0x80074 in mkstemp64 () #3 0x80f00 in mkstemp64 () #4 0x836f8 in mkstemp64 () #5 0x7ecec in mkstemp64 () #6 0x49070 in mkstemp64 () #7 0x48e34 in mkstemp64 () #8 0x36340 in _init () #9 0x3496c in _init () #10 0x31aa0 in _init () #11 0x31240 in _init () #12 0x1d508 in _init () #13 0x1b310 in _init () #14 0x13f3c in _init () OS is Solaris 7, running on Sparc. OpenSSH was configured as follows: /phil/sw/src/openssh-4.2p1/configure \ --prefix=/phil/sw/sunos/sparc/pkg/openssh-4.2p1 \ --sysconfdir=/phil/etc/openssh \ --without-rsh \ --with-pid-dir=/phil/var/run \ --with-ssl-dir=/phil/sw/sunos/sparc/pkg/openssl-0.9.8 \ --with-cppflags="-I/phil/sw/sunos/sparc/pkg/zlib-1.2.3/include -I/phil/src/tcpwrappers-7.6" \ --with-ldflags="-L/phil/sw/sunos/sparc/pkg/zlib-1.2.3/lib -L/phil/src/tcpwrappers-7.6" \ --with-default-path=/usr/bin:/bin:/phil/sw/sunos/sparc/bin \ --with-tcp-wrappers \ --with-skey=/phil/sw/pkg/skey-1.1.5 \ --with-privsep-user=accessy \ --with-privsep-path=/phil/var/prison
We might be able to help you if you tell us what you are doing when you get those coredumps. Also rebuild with debugging enabled, as a debugless trace doesn't tell us much.
(In reply to comment #1) > We might be able to help you if you tell us what you are doing when you get > those coredumps. Also rebuild with debugging enabled, as a debugless trace > doesn't tell us much. What I'm doing is: $ ssh <host> Then: core dump, about one in four tries. I'll rebuild with debugging at earliest convenience.
Also, which compiler (and version) are you using?
(In reply to comment #3) > Also, which compiler (and version) are you using? s@goedel:pts/0(9) gcc -v ~ 19:34 Using built-in specs. Target: sparc-sun-solaris2.7 Configured with: /phil/sw/src/gcc-4.0.1/configure --prefix=/phil/sw/sunos/sparc/pkg/gcc-4.0.1 --disable-libgcj --enable-languages=c,c++,objc --with-gnu-as --with-as=/phil/sw/sunos/sparc/bin/as --with-gnu-ld --with-ld=/phil/sw/sunos/sparc/bin/ld --enable-shared Thread model: posix gcc version 4.0.1
Could you try a different compiler? gcc 4.x appears generate broken code on quite a few platforms, e.g. bug #1080
And when you do, please make sure you recompile any of the prereqs that were compiled with the newer compiler (esp. openssl but zlib too). You might also want to run openssl's self-test ("make tests") after you build it. People have also reported problems with openssl 0.9.8 but I'm not sure if those were compiler-related or not.
Hi there, I have the same problem when connecting with OpenSSH_4.2p1, OpenSSL 0.9.8 05 Jul 2005 The compiler is a gcc version 3.4.3 (csl-sol210-3_4-branch+sol_rpath). I followed this bug and rebuild zlib and libopenssl. Solaris Version is 5.10 Generic_118822-02 sun4u sparc SUNW,Sun-Fire-V240. I executed "make tests" a couple of times, and had a number of Segfaults. The last run produced the following output: run test exit-status.sh ... test remote exit status: proto 1 status 0 test remote exit status: proto 1 status 1 test remote exit status: proto 1 status 4 test remote exit status: proto 1 status 5 test remote exit status: proto 1 status 44 test remote exit status: proto 2 status 0 Write failed: Broken pipe exit code (with sleep) mismatch for protocol 2: 255 != 0 test remote exit status: proto 2 status 1 Segmentation Fault - core dumped exit code mismatch for protocol 2: 139 != 1 Segmentation Fault - core dumped exit code (with sleep) mismatch for protocol 2: 139 != 1 test remote exit status: proto 2 status 4 Write failed: Broken pipe exit code mismatch for protocol 2: 255 != 4 Segmentation Fault - core dumped exit code (with sleep) mismatch for protocol 2: 139 != 4 test remote exit status: proto 2 status 5 test remote exit status: proto 2 status 44 failed remote exit status make[1]: *** [t-exec] Error 1 make[1]: Leaving directory `/opt/gad/sources/openssh-4.2p1/regress' make: *** [tests] Error 2 Running "ssh -vvv" produced the following output: OpenSSH_4.2p1, OpenSSL 0.9.8 05 Jul 2005 debug1: Reading configuration data /etc/ssh/ssh_config debug2: ssh_connect: needpriv 0 debug1: Connecting to gszulg01 [10.64.10.84] port 22. debug1: Connection established. debug1: permanently_set_uid: 0/0 debug1: identity file /.ssh/identity type -1 debug1: identity file /.ssh/id_rsa type -1 debug1: identity file /.ssh/id_dsa type -1 debug1: Remote protocol version 1.99, remote software version OpenSSH_4.1 debug1: match: OpenSSH_4.1 pat OpenSSH* debug1: Enabling compatibility mode for protocol 2.0 debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_4.2 debug2: fd 4 setting O_NONBLOCK debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: kex_parse_kexinit: ssh-rsa,ssh-dss debug2: kex_parse_kexinit: aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,arcfour128,arcfour256,arcfour,aes192-cbc,aes256-cbc,rijndael-cbc@lysator.liu.se,aes128-ctr,aes192-ctr,aes256-ctr debug2: kex_parse_kexinit: aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,arcfour128,arcfour256,arcfour,aes192-cbc,aes256-cbc,rijndael-cbc@lysator.liu.se,aes128-ctr,aes192-ctr,aes256-ctr debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: none,zlib@openssh.com,zlib debug2: kex_parse_kexinit: none,zlib@openssh.com,zlib debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: first_kex_follows 0 debug2: kex_parse_kexinit: reserved 0 debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: kex_parse_kexinit: ssh-rsa,ssh-dss debug2: kex_parse_kexinit: aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,arcfour,aes192-cbc,aes256-cbc,rijndael-cbc@lysator.liu.se,aes128-ctr,aes192-ctr,aes256-ctr debug2: kex_parse_kexinit: aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,arcfour,aes192-cbc,aes256-cbc,rijndael-cbc@lysator.liu.se,aes128-ctr,aes192-ctr,aes256-ctr debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: none,zlib debug2: kex_parse_kexinit: none,zlib debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: first_kex_follows 0 debug2: kex_parse_kexinit: reserved 0 debug2: mac_init: found hmac-md5 debug1: kex: server->client aes128-cbc hmac-md5 none debug2: mac_init: found hmac-md5 debug1: kex: client->server aes128-cbc hmac-md5 none debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP Segmentation Fault (core dumped) Backtrace is as follows: # adb core core file = core -- program ``/usr/bin/ssh'' on platform SUNW,Sun-Fire-V240 SIGSEGV: Segmentation Fault $C ffbfee20 bn_sub_words+0x3c(16b850, 16b3e0, 16b400, 7, 1, 9da20) ffbfee90 bn_mul_recursive+0x40c(1, 20, 0, 10, 0, ffffffff) ffbfef10 bn_mul_recursive+0x2e4(1, 40, 0, 20, 0, ffffffff) ffbfef90 bn_mul_recursive+0x2e4(1, 80, 0, 40, 0, ffffffff) ffbff010 BN_mul+0x2c4(159634, 16b530, 15960c, 159820, 2, 1) ffbff088 BN_mod_mul_montgomery+0x3c(0, 1595f8, 15960c, 159858, 159820, 80) ffbff0f8 BN_mod_exp_mont_consttime+0x56c(1595f8, 16b320, 100, d, 159820, 159858) ffbff180 BN_mod_exp_mont+0x70(156308, 1562a8, ffbff2e0, 156288, 159820, 159858) ffbff278 generate_key+0x94(15b7f0, 20, 1562e8, 0, 43, 149360) ffbff308 DH_generate_key+0xc(15b7f0, 1562e8, 20, 0, c3, 0) ffbff378 dh_gen_key+0x7c(15b7f0, 100, 1f, 7e0, ff000, ff) ffbff3e8 kexgex_client+0x174(1586d0, 400, 916c8, 4e2fc, 2000, 1000) ffbff488 kex_input_kexinit+0x5fc(1, 6, 1586d0, 158098, 169c10, 1586e0) ffbff500 dispatch_run+0x94(0, 158714, 1586d0, 156248, 52ddc, 14e400) ffbff578 ssh_kex2+0x17c(163688, 140c00, ffbff764, 15625c, 1, 0) ffbff5e8 ssh_login+0x334(5, ffbff850, 4, 4, 1538b0, 152000) ffbff860 main+0xce8(152064, 161ca0, 151c00, 151800, 153f48, 153400) ffbffb20 _start+0x5c(0, 0, 0, 0, 0, 0) Regards, Christian PS: Sorry for asking, but I searched the documentation, the net and even looked at the configure script, but I didn't find a clue of how to enable debugging during compile time. Did I miss something, and if so, could you advice of how to enable debugging?
(In reply to comment #7) > core file = core -- program ``/usr/bin/ssh'' on platform SUNW,Sun-Fire-V240 > SIGSEGV: Segmentation Fault > $C > ffbfee20 bn_sub_words+0x3c(16b850, 16b3e0, 16b400, 7, 1, 9da20) Looks like a problem with OpenSSL (the trace certainly points there). Did OpenSSL's self-test ("make tests") pass? Does the same problem occur with openssl-0.9.7g? [...] > PS: Sorry for asking, but I searched the documentation, the net and even looked > at the configure script, but I didn't find a clue of how to enable debugging > during compile time. Did I miss something, and if so, could you advice of how > to enable debugging? Debug symbols? Depends on your compiler, but for gcc it's automatically enabled (the "-g" flag). If it's not, then pass the appropriate flag via --with-cflags, eg: ./configure --with-cflags=-g Note that by default, those symbols are stripped out in the installed binaries (ie you should use the compiled files in your build dir for debugging with gdb, adb or similar).
(1) Looks like an OpenSSL 0.9.8 issue to me. Does not happen with 0.9.7g. 0.9.8's "make test" was unproblematic, though. (2) Built OpenSSH with the "-g" flag. The core dump showsjs@goedel:pts/5(22) adb /phil/sw/sunos/sparc/obj/openssh-4.2p1/ssh core ~ 21:17 core file = core -- program ``ssh'' on platform SUNW,Ultra-250 SIGSEGV: Segmentation Fault $C bn_sub_words() + 3c [savfp=0xffbeef48,savpc=0x801bc] bn_sub_part_words(13f400,13ba88,13baa8,7,1,ac0b2f1) + 10 [savfp=0xffbeef48,savpc=0x801bc] bn_mul_recursive(20,ffffffff,0,10,0,ffffffff) + 41c [savfp=0xffbeefd8,savpc=0x8006c] bn_mul_recursive(40,ffffffff,13f360,20,0,ffffffff) + 2cc [savfp=0xffbef068,savpc=0x80ef8] BN_mul(13e700,13e6b0,13e6c4,13e628,3,1) + 2b8 [savfp=0xffbef0e0,savpc=0x836f0] BN_mod_mul_montgomery(13e6b0,13e6b0,13e6c4,13e480,13e628,1f) + 30 [savfp=0xffbef150,savpc=0x7ece4] BN_mod_exp_mont_consttime(80,bb,ffbef244,b,13e628,13e480) + 424 [savfp=0xffbef1d8,savpc=0x49068] generate_key(13e5d0,20,12a490,0,130,218ac) + 1c8 [savfp=0xffbef268,savpc=0x48e2c] DH_generate_key(13e5d0,12a490,12a760,0,0,0) + c [savfp=0xffbef2d8,savpc=0x36338] dh_gen_key(13e5d0,80,400,2000,ff1b5eec,ff146618) + 80 [savfp=0xffbef348,savpc=0x34964] kexgex_client(13b8c0,2,0,0,ff1b5eec,400) + 168 [savfp=0xffbef3f0,savpc=0x31a98] kex_input_kexinit(1,6,13b8c0,13b938,13ea20,2) + 45c [savfp=0xffbef468,savpc=0x31238] dispatch_run(0,13b904,13b8c0,0,12a424,ff0000) + 54 [savfp=0xffbef4e0,savpc=0x1d500] ssh_kex2(114c00,125aa0,0,7efefeff,81010100,ff0000) + 124 [savfp=0xffbef550,savpc=0x1b308] ssh_login(126f34,ffbef7b8,4,4,127718,125800) + 30c [savfp=0xffbef7c8,savpc=0x13f34] main(125800,125aa0,122000,126c00,127cd0,125800) + c20 [savfp=0xffbefa78,savpc=0x1245c]
(In reply to comment #9) > (1) Looks like an OpenSSL 0.9.8 issue to me. Does not happen with 0.9.7g. > 0.9.8's "make test" was unproblematic, though. What options did you use when you built openssl-0.9.8? I'm trying to reproduce the problem.
Today I built an OpenSSH 4.2p1 with OpenSSL0.9.7g. Both OpenSSL and OpenSSh passed all tests, there wasn't one SegFault. After this I removed all build directories and compiled OpenSSh with OpenSSL 0.9.8 again, including debugging flags. I don't know why, and I'm not really happy about it, but this time OpenSSH passed all tests and seems to work flawlessly. So, in my point of view, this bug might be closed without solution. I guess that that probably the build environment wasn't sane. But this problem occured on several build results created by at least two people on two different machines (both running Solaris 10, thou).
Argh, a Heisenbug! I hate unsolved mysteries too, but I have no idea what else to suggest. There's a similar trace in bug #910 with a segfault at the same place (HP-UX, HP ANSI C compiler). I think it was openssl-0.9.8. I'm going to leave this bug open for a while and see if we can collect any more info.
I'm now pretty sure this an OpenSSL bug. I helped someone else with a crash in the same place (DH GEX) and was able to reproduce it. It was a caused by a problem in UltraSPARC assembler implementation of bn_sub_words(). Since it's in the assembler code, building OpenSSL with "no-asm" will not exhibit the problem. This is from OpenSSL's CVS log: [quote] revision 1.5 date: 2005/11/15 08:02:10; author: appro; state: Exp; lines: +12 -0 Apply "better safe than sorry" approach after addressing sporadic SEGV in bn_sub_words to the rest of the sparcv8plus.S. ---------------------------- revision 1.4 date: 2005/11/11 20:07:07; author: appro; state: Exp; lines: +2 -2 Attempt to resolve sporadic SEGV crashes in bn_sub_words in OpenSSH. I'm baffled why it crashes and does it sporadically... [/quote] (according to OpenSSL's CVS, this patch is in OpenSSL >= 0.9.7j and >= 0.9.8b). I replaced only that file in openssl-0.9.8a, rebuilt everything and was no longer reproduce the problem. I recommend that you upgrade to OpenSSL 0.9.8d (or the latest 0.9.7) and rebuild OpenSSH (if you haven't already). It took a while, but I think we can now close this bug :-)
I'm pretty sure this one is now solved. Please reopen if this is not the case.
Close resolved bugs after release.