make on v4.5p1 fails as described below. Using Solaris 8, gcc 3.4.6, gnu ld 2.17. Previously installed latest zlib (1.2.3) and OpenSSL (0.9.8d). Also removed Sun pkg SUNWzlib just to be sure there was no conflict between it and new zlib. make yields following output: /usr/local/bin/ld -o ssh ssh.o readconf.o clientloop.o sshtty.o sshconnect.o sshconnect1.o sshconnect2.o -L. -Lopenbsd-compat/ -L/usr/local/ssl/lib -R/usr/local/ssl/lib -lssh -lopenbsd-compat -lresolv -lcrypto -lrt -lz -lsocket -lnsl /usr/local/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000012200 readconf.o: In function `process_config_line': /home/fbaird/bld/openssh-4.5p1/readconf.c:527: undefined reference to `__muldi3' /home/fbaird/bld/openssh-4.5p1/readconf.c:529: undefined reference to `__divdi3' ./libssh.a(packet.o): In function `set_newkeys': /home/fbaird/bld/openssh-4.5p1/packet.c:671: undefined reference to `__ashldi3' /usr/local/ssl/lib/libcrypto.a(bn_div.o): In function `BN_div': bn_div.c:(.text+0x280): undefined reference to `__udivdi3' /usr/local/ssl/lib/libcrypto.a(bn_word.o): In function `BN_mod_word': bn_word.c:(.text+0x64): undefined reference to `__umoddi3' /usr/local/ssl/lib/libcrypto.a(b_print.o): In function `fmtint': b_print.c:(.text+0x26c): undefined reference to `__umoddi3' b_print.c:(.text+0x2a8): undefined reference to `__udivdi3' make: *** [ssh] Error 1 A bit of research revealed these symbols are 64-bit arithmetic operators from libgcc, so I reconfigured using LIBS=-lgcc ./configure After that make succeeded, but every invocation of ld yields a warning such as /usr/local/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000011424 The default address changes for each warning. Then make install fails with rm -f /usr/local/bin/slogin ln -s ./ssh /usr/local/bin/slogin rm -f /usr/local/share/man/man1/slogin.1 ln -s ./ssh.1 /usr/local/share/man/man1/slogin.1 if [ ! -d /usr/local/etc ]; then \ ./mkinstalldirs /usr/local/etc; \ fi /usr/local/etc/ssh_config already exists, install will not overwrite /usr/local/etc/sshd_config already exists, install will not overwrite /usr/local/etc/moduli already exists, install will not overwrite Illegal Instruction - core dumped Illegal Instruction - core dumped Illegal Instruction - core dumped A bit more investigation reveals that these illegal instructions are occurring during the invocation of ssh-keygen. Ok, now I'm out of options and I feel as if I haven't been solving the real problem. Any ideas?
These errors aren't anything to do with OpenSSH - it looks like your gcc installation is broken.
Please attach (don't paste it in the comment field) a complete build log, it might be some gcc options that are messing things up
(In reply to comment #2) > Please attach (don't paste it in the comment field) a complete build > log, it might be some gcc options that are messing things up > Do you mean a build log of gcc, or openssh, or both? (Thanks very much for being willing to look into this, BTW.)
of openssh please
Created attachment 1224 [details] Build log for OpenSSH-4.5p1 as requested Sorry it took so long. Thanks for looking at this.
It really does look like your gcc install is broken: it seems to be missing the internal functions necessary for 64-bit math, and possibly other things to. Can you try to compile and execute the following test program? ---- #include <stdio.h> int main(void) { u_int64_t a = 32, b = 3, c; c = a / b; printf("%llu\n", c); return 0; }
(In reply to comment #6) > It really does look like your gcc install is broken: it seems to be > missing the internal functions necessary for 64-bit math, and possibly > other things to. Can you try to compile and execute the following test > program? > > ---- > > #include <stdio.h> > int main(void) > { > u_int64_t a = 32, b = 3, c; > c = a / b; > printf("%llu\n", c); > return 0; > } > This program would not compile because u_int64_t is not defined. However, as I understand it, this type is the same as unsigned long long. When I changed "u_int64_t" to "unsigned long long" the program compiled and ran fine, giving a result of 10. This is the expected answer due to integer truncation.
Since you are targeting my gcc build, I thought it might be useful to show you the configure command used to build gcc: /home/fbaird/bld/gcc-3.4.6/configure --disable-nls --prefix=/usr/local --enable-languages=c++ --with-gnu-ld --with-ld=/usr/local/bin/ld --with-gnu-as --with-as=/usr/local/bin/as --enable-shared --with-gcc-version-trigger=/home/fbaird/bld/gcc-3.4.6/gcc/version.c disable-nls turns off international language support. This is a custom program that will only see an English-speaking audience, and I avoid having to install and build gettext. I only use C and C++ and so do not build support for any other languages. Both these options save hours when compiling gcc.
Another thought: do you have multiple instances of libgcc? Perhaps one in /usr/local/lib and one elsewhere?
(In reply to comment #9) > Another thought: do you have multiple instances of libgcc? Perhaps one > in /usr/local/lib and one elsewhere? > Well, yes I do. I have a libgcc at /usr/local/lib/libgcc_s.so (which is from gcc 3.4.6), /usr/local/lib/gcc/sparc-sun-solaris2.8/3.4.3/libgcc.a and /usr/local/gcc-2.95.2/lib/gcc-lib/sparc-sun-solaris2.8/2.95.2/libgcc.a. But while researching this, I was looking at the Makefile and found this: # uncomment if you run a non bourne compatable shell. Ie. csh #SHELL = /usr/bin/sh I use tcsh as my default shell (being a long time Sun guy, where the default shell has always been csh), so I ran under the bash shell and the problem appeared to go away. However, now I have different problem (though it might actually be the same one, keep reading). I often see errors of the sort /usr/local/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000013168 If I let the make continue it does compile, but the programs that are built core dump. I found that the warning means I need to tell ld I am using static linking. After looking more at the Makefile I saw that libssh is built for static linking (it's a .a file). So I tried adding -static to the linker flags and then several libs were unfound. I found that this was because I only have dynamic version of many of the libraries (for example, libresolv and librt). I have since tried many variants of trying to tell ld that some libs are static and some libs are dynamic, but apparently it's not working. I keep getting messages such as "Unable to find librt". I think that my original error was caused by the linker picking up either the gcc 2.95.2 libgcc or the 3.4.3 libgcc since both of those are statically linked libs. Anyway, do you know how I can tell ld (gnu ld 2.17) which libs to link statically and which to link dynamically? Or could I change the Makefile to make libssh dynamic? Is there some overriding reason why libssh must be static and not dynamic? Thanks.
(In reply to comment #10) > I use tcsh as my default shell (being a long time Sun guy, where the > default shell has always been csh), so I ran under the bash shell and > the problem appeared to go away. Are LIBRARY_PATH or LD_LIBRARY_PATH different with the different shells? Maybe their respective startup scripts set them differently. The other thing to check is if you the runtime library path has been tweaked with crle(1). > I found that the warning means I need to tell ld I am using static > linking. After looking more at the Makefile I saw that libssh is built > for static linking (it's a .a file). So I tried adding -static to the > linker flags and then several libs were unfound. I don't think you need to build the whole thing static (as you saw, there are no static variants of some libraries), but what you need to do is link in the right static libgcc. > Anyway, do you know how I can tell ld (gnu ld 2.17) which libs to link > statically and which to link dynamically? I think the normal algorithm is: for each required library for each path in LIBRARY_PATH if there's a dynamic library then use it if there's a static library then use it so I think that you need to put -L/usr/local/lib/gcc/sparc-sun-solaris2.8/3.4.3 at the *start* of the linker flags. This should do it: ./configure --with-ldflags=-L/usr/local/lib/gcc/sparc-sun-solaris2.8/3.4.3 Depending on how critical the system is and what uses the dynamic libgcc.so, you could temporarily rename the libgcc.so in /usr/local/lib while you do the build. > Or could I change the > Makefile to make libssh dynamic? Is there some overriding reason why > libssh must be static and not dynamic? Thanks. No, I don't think libssh is contributing to the problem.
(In reply to comment #11) I was able to resolve the whole issue of which libraries are being used with this command: ldd ssh And the outputs are: libresolv.so.2 => /usr/lib/libresolv.so.2 libcrypto.so.0.9.8 => /usr/local/ssl/lib/libcrypto.so.0.9.8 librt.so.1 => /usr/lib/librt.so.1 libz.so => /usr/local/lib/libz.so libsocket.so.1 => /usr/lib/libsocket.so.1 libnsl.so.1 => /usr/lib/libnsl.so.1 libc.so.1 => /usr/lib/libc.so.1 libgcc_s.so.1 => /usr/local/lib/libgcc_s.so.1 libdl.so.1 => /usr/lib/libdl.so.1 libaio.so.1 => /usr/lib/libaio.so.1 libmp.so.2 => /usr/lib/libmp.so.2 /usr/platform/SUNW,Sun-Fire-V250/lib/libc_psr.so.1 All of these libraries are the correct ones. /usr/local/lib/libgcc_s.so.1 is the correct libgcc and is from gcc 3.4.6. So the problems boils down to this: why do I get ld warnings of the sort /usr/local/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000013168? I found that someone else has been having the same problem: http://www.mail-archive.com/secureshell@securityfocus.com/msg01479.html He apparently hasn't gotten a resolution either. He is using Solaris 9 with gcc 3.2.3 and I am using Solaris 8 with gcc 3.4.6. Maybe its a Solaris+gcc issue?
(In reply to comment #12) > I found that someone else has been having the same problem: > > http://www.mail-archive.com/secureshell@securityfocus.com/msg01479.html > > He apparently hasn't gotten a resolution either. Actually he did get it resolved eventually: http://marc.theaimsgroup.com/?l=secure-shell&m=116982358800235 Hopefully Jeff won't mind me quoting a snippet from a private mail describing what he did that resolved the problem for him: [quote] > When you asked me about spurious libgcc's on disk, I did a > 'find / ...' and found a static one as well as a dynamic one > in /usr/local. Knowing that I had messed around with several > GCC packages from Sunfreeware.com, I deleted all the find > results from disk, uninstalled the current SMCgcc package > and reinstalled 3.3.2 before trying the SSL + SSH build > again. > > So maybe that solved it. [/quote]
(In reply to comment #13) I cleaned my system of any "rogue" libgcc's and completely rebuilt gcc and libgcc. This did get rid of the original problem (undefined refs in readconf.o) but I never did get rid of the problem with the linker being unable to find the symbol "_start". The executables would build but always core dumped. I even went so far as to rebuild zlib and openssl too. So, uncle. I give up. I installed openssh-4.5p1-sol8-sparc-local.gz from sunfreeware.com. Perhaps the problem arose from being on Sol 8 instead of 9 or 10. And maybe the guys at sunfreeware know some Sol 8 tricks for compilation. Regardless, I now have a functioning copy. Thanks for all the time and help. BTW, their installation guide (http://sunfreeware.com/openssh8.html) is really good, even if you compile from the source rather than adding a package.
(In reply to comment #14) > I cleaned my system of any "rogue" libgcc's and completely rebuilt gcc > and libgcc. This did get rid of the original problem (undefined refs in > readconf.o) but I never did get rid of the problem with the linker > being unable to find the symbol "_start". The executables would build > but always core dumped. I even went so far as to rebuild zlib and > openssl too. Well, I'm glad that you have something working. Building on your system should work. (Solaris 8 with gcc 3.4.3 and binutils 2.17 is a platform I test on regularly. I might bootstap gcc 3.4.6 and see if it's any different, but Jeff's similar problem was with 3.2.3). I'm willing to help you (try to) figure it out if you want (I hate unsolved mysteries). If so then you can try getting a backtrace of the segfault ("gdb ./bad_binary" then "backtrace" at the (gdb) prompt). If not then please feel free to close this bug. Hey, I was just comparing Jeff's logs to yours and I do see one common element: you're both calling "ld" explicitly instead letting gcc do the linking (the latter being what configure will normally opt to do if left to its own devices). This would also explain some of the original errors: gcc knows it needs libgcc for some things, knows where to find it and automagically links it, but ld doesn't. Do you set $LD, and if so does unsetting it make a difference?
OK, I'm now willing to put money on it :-). On my otherwise working Solaris 8 system: $ LD=/usr/local/bin/ld ../../configure && make checking for gcc... gcc [.. lots of output ..] /usr/local/bin/ld -o ssh ssh.o readconf.o clientloop.o sshtty.o sshconnect.o sshconnect1.o sshconnect2.o -L. -Lopenbsd-compat/ -L/usr/local/ssl/lib -R/usr/local/ssl/lib -lssh -lopenbsd-compat -lresolv -lcrypto -lrt -lz -lsocket -lnsl /usr/local/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000012080 readconf.o: In function `process_config_line': ../../readconf.c:527: undefined reference to `__muldi3' ../../readconf.c:529: undefined reference to `__divdi3' ./libssh.a(packet.o): In function `set_newkeys': ../../packet.c:670: undefined reference to `__ashldi3' /usr/local/ssl/lib/libcrypto.a(bn_div.o): In function `BN_div': bn_div.c:(.text+0x264): undefined reference to `__udivdi3' bn_div.c:(.text+0x28c): undefined reference to `__muldi3' /usr/local/ssl/lib/libcrypto.a(bn_word.o): In function `BN_mod_word': bn_word.c:(.text+0x40): undefined reference to `__umoddi3' /usr/local/ssl/lib/libcrypto.a(b_print.o): In function `fmtint': b_print.c:(.text+0x264): undefined reference to `__umoddi3' b_print.c:(.text+0x284): undefined reference to `__udivdi3' b_print.c:(.text+0x728): undefined reference to `__umoddi3' b_print.c:(.text+0x748): undefined reference to `__udivdi3' make: *** [ssh] Error 1 If I then hack Makefile to call ld directly with libgcc in the library path: ld -o ssh ssh.o readconf.o clientloop.o sshtty.o sshconnect.o sshconnect1.o sshconnect2.o -L. -Lopenbsd-compat/ -L/usr/local/ssl/lib -R/usr/local/ssl/lib -L/usr/local/lib/gcc/sparc-sun-solaris2.8/3.4.6/ -R/usr/local/lib/gcc/sparc-sun-solaris2.8/3.4.6/ -lssh -lopenbsd-compat -lresolv -lcrypto -lrt -lz -lsocket -lnsl -lgcc ld: warning: cannot find entry symbol _start; defaulting to 00000000000120c0 $ ./ssh Illegal Instruction (core dumped)
(In reply to comment #16) Yes, that did the trick. I hate unsolved mysteries too. Thanks for putting the effort into understanding this problem. It seems odd that using the gnu linker instead of the gnu compiler to link would cause such problems. This is the only package I've ever run into with that issue. Perhaps it would be good to change configure.ac so that if the OS is Solaris and $CC=gcc, (or whatever the appropriate set of triggers is) then set LD=gcc even if it is already set in the executing shell. Thanks again.
(In reply to comment #17) > It seems odd that using the gnu linker instead of the gnu compiler to > link would cause such problems. This is the only package I've ever run > into with that issue. I suspect that any other packages that use unsigned long long arithmetic would have the same problem, but those might not be that common. Even compiling Damien's test program and linking by hand exhibits the same problem: $ gcc -c testprog.c $ ld -o testprog testprog.o -lc ld: warning: cannot find entry symbol _start; defaulting to 00000000000101c8 testprog.o: In function `main': testprog.c:(.text+0x24): undefined reference to `__udivdi3' > Perhaps it would be good to change configure.ac so that if the OS is > Solaris and $CC=gcc, (or whatever the appropriate set of triggers is) > then set LD=gcc even if it is already set in the executing shell. I think that it's going to be dependent on the system and compiler config. As a general rule, configure assumes that if you set things like LD then you had a reason for it.
Since the cause of this is now known I am closing this bug.
Close resolved bugs after release.