I am able to produce the following behaviour by sshd on Solaris 2.8 and Linux 2.4.23, when NGROUPS_MAX at runtime is larger than at compile-time: On both systems "fatal: getgroups: Invalid argument" gets logged via syslog and the sshd is terminating before any authentication is attempted. I located the problem in uidswap.c, where deprecated NGROUPS_MAX is used. NGOUPS_MAX is defined in limits.h and tells the maximum number of groups which an account can be member of. As NGROUPS_MAX is determined at compile-time, this limit gets hardcoded into the resulting binary. As NGROUPS_MAX may be larger at runtime than at compile-time it should be regarded as deprecated and sysconf(_SC_NGROUPS_MAX) should be used instead. (see APUE 2.4.5 also) In uidswap.c, line 41 NGROUPS_MAX is used to initialize static arrays in global context. These 2 occurances of NGROUPS_MAX cannot be substituted through sysconf(_SC_NGROUPS_MAX) easily as memory has to be allocated at runtime. In the same file NGROUPS_MAX is referenced in line 72 an 81. These occurances can be replaced easily, once memory for the arrays is allocated according to sysconf(_SC_NGROUPS_MAX). This bug also constitutes a minor security problem as it may be exploited to remotely enumerate accounts, which are member of more then NGROUPS_MAX at compile-time.
I will look at this.
Created attachment 539 [details] Use sysconf where available for NGROUPS_MAX. Please try this patch. Note: treat with *extreme* suspicion: it contains some pointer-fu and compiled first try without warnings, so Murphy can't be far away...
Comment on attachment 539 [details] Use sysconf where available for NGROUPS_MAX. >Index: defines.h >=================================================================== >RCS file: /usr/local/src/security/openssh/cvs/openssh_cvs/defines.h,v >retrieving revision 1.109 >diff -u -p -r1.109 defines.h >--- defines.h 27 Jan 2004 05:40:35 -0000 1.109 >+++ defines.h 6 Feb 2004 09:27:45 -0000 >@@ -541,6 +541,10 @@ struct winsize { > # define SSH_SYSFDMAX 10000 > #endif > >+#ifdef HAVE_SYSCONF >+# undef NGROUPS_MAX >+# define NGROUPS_MAX (sysconf(_SC_NGROUPS_MAX)) I think that should be: #if defined(HAVE_SYSCONF) && defined(_SC_NGROUPS_MAX) We also need to check for sysconf returning -1 I'm wary of this change for 3.8. Perhaps a static check for gid >= NGROUPS_MAX?
Created attachment 548 [details] NGROUPS patch
The proposed fix is wrong, I think. Even with sysconf, the NGROUPS_MAX could be REALLY large. You don't really want the max possible, you want the current number, right? Patch attached. The only really interesting part is that the bit where it passes a fake gid_t array to getgrouplist() works, but gets a SEGV on return from that function (on my RH9 Linux box, anyway). Changing fake to an array[4] and setting ngroups to 0 works, though. Haven't investigated further. Does this look more correct, though?
Comment on attachment 548 [details] NGROUPS patch >diff -u -u -r1.42 uidswap.c >+ saved_egroups = xrealloc(saved_egroups, >+ saved_egroupslen * sizeof(*saved_egroups)); I get a fatal here during authentication: debug1: temporarily_use_uid: 500/500 (e=0/0) xrealloc: zero size
Comment on attachment 548 [details] NGROUPS patch There are two changes in this patch. Making the groups_byname array dynamic is OK, but I don't know about this: >+ } else { >+ char gidstr[32]; >+ >+ logit("getgrgid: unknown group id: %d", >+ (int)groups_bygid[i]); >+ snprintf(gidstr, sizeof(gidstr), "%d", >+ (int)groups_bygid[i]); >+ groups_byname[i] = xstrdup(gidstr); >+ } I not sure whether it makes sense, but is it a change in behaviour and should be a separate patch.
Would you rather the "unknown group id" string bit be dumped altogether. Previously (well, currently, I guess) the code just discards the byname entry for any groups without a name. This results in the bygid[] and byname[] arrays not being parallel for every index. If you want them to stay parallel, you need a filler for the gids without names. Alternatively, you could make them NULL and just catch that case in any code that touches groups_byname[]. So would you rather I: a) do it the old way and just drop out unnamed groups (and be non-parallel) b) stay parallel but make the unnamed entries be NULL (and fix ga_match() and ga_free()) c) stay parallel but add an informative error message (as it is now) I'll be happy to resubmit a patch post-haste if someone would care to make that decision. Probably a or b sounds fine to me. On further consideration, c isn't so tasteful. :) I'd like top get this patch off my plate, so I can wrap up the NGROUPS stuff. Anything else I need to do to make this go in in the near future?
I think we are in a lock for 3.8.. However I'm more in favor of the existing behavior. The only other issue is the one pointed out by Darren. "No secondary groups" - Ben
I've got the "no secondary groups" thing fixed in my tree (I think). I'll go for existing behavior on this and repost the patch. I notice that this bug is now block #793 (3.8p1 release) so that is fine for me. As long as the patch is "approved" it is off my plate and onto yours :)
Created attachment 549 [details] new NGROUPS patch This should work if you have no supplementary groups. It also keeps existing behavior for unnamed groups in groups_byname[]. Issue: the first call to getgrouplist() in groupaccess.c:ga_init(). On my unpatched RH9 box, this segfaults. On my RHEL3 box (should be just like RH9) it works. Based on stack examination, the getgroupslist() function on my RH9 box writes the gid list to the stack, heedless of the ngroups parameter. The RHEL box seems to do the right thing, except for wantonly assuming that at least ONE gid will be available and that ngroups is at least 1. So I work around the case of requiring one gid (not too gross), but what can be done about it ignoring the ngroups param on RH9? Nothing that seems reasonable. Fix glibc. So I think this is correct, or as correct as can be. More testers to confirm that would be nice.
We could use OpenBSD's: $ wc -l /usr/src/lib/libc/gen/getgrouplist.c 94 /usr/src/lib/libc/gen/getgrouplist.c but from reading the manpage, it looks like the behaviour of returning the number of groups is not guaranteed - OpenBSD's implementation will return up to will return up to *ngroups, but not more. We may need OpenBSD's implementation anyway - getgrouplist isn't POSIX, it comes from 4.4BSD. I think we will need to feed getgrouplist with the maximum available, i.e: int ngroups; gid_t *gidlist ngroups = NGROUPS_MAX; #if defined(HAVE_SYSCONF) && defined(_SC_NGROUPS_MAX) ngroups = MAX(ngroups, sysconf(_SC_NGROUPS_MAX)); #endif gidlist = xmalloc(sizeof(*gidlist) * ngroups)); if (getgrouplist(user, group, gidlist, &ngroups) == -1) fatal xfree(gidlist);
NGROUPS_MAX might be INT_MAX. You *can't* use it as an array size. We could replace getgrouplist() with a hand-rolled: int get_ngroups(const char *user); That would avoid the reliance on using getgrouplist() with a short list. How hard is it to walk the getgr* functions to count how many groups you are in? We can add the 'base' group, too and filter that out from the getgr* results. I bet it's 20 lines of code.
> NGROUPS_MAX might be INT_MAX. That would be utterly idiotic.
Not if there really is NO LIMIT to the number of groups. Or it might be 256k. Is it really worth a 256 to a meg of stack space for this?
Created attachment 550 [details] NGROUPS patch with our own get_ngroups() Since getgrouplist() is stupid when we just want it to count the groups, we can roll our own. This might not be needed. The prior (-4) version of this patch SHOULD work on anything that supports getgrouplist() (modulo buffer overflow bugs in getgrouplist() which this is happy to trigger), which must be everything that openssh supports, since we use it.
Comment on attachment 550 [details] NGROUPS patch with our own get_ngroups() >Index: uidswap.c [...] >+ saved_egroupslen = getgroups(0, NULL); Should use get_ngroups here too, no? >Index: groupaccess.c [...] >+ struct group *gr = getgrent(); If you're using getgrent() shouldn't you setgrent() first?
All the manpages that I see state that sysconf(SC_NGROUPS_MAX) is the canonical place to determine the maximum number of groups that an account can be a member of. Furthermore, the Linux 2.6.1 and glibc -current headers define NGROUPS_MAX as 32. Why would anyone set a *runtime* limit higher than the actual max number of groups in use for a single account?
Created attachment 551 [details] Size group array using sysconf, if available
Darren, >+ saved_egroupslen = getgroups(0, NULL); This does the right thing: it returns the number of supplemental groups that the process is a member of.
Ah, OK, misread it.
Darren: getgroups() works properly, but yeah setgrent() is probably correct. Damien: Linux 2.6.3 will change NGROUPS_MAX to 64k. You *really* do not care what _SC_NGROUPS_MAX says - it is the MAXIMUM (actually, it's the minimum maximum, says POSIX. The actual maximum may be higher than sysconf() reports, if I recall correctly). All you care about is the ACTUAL count. You can get the actual count from a getgrent loop or from a proper getgrouplist() (which work on all systems, and not just HAVE_SYSCONF systems). I am pretty sure that either version will work (with a call to setgrent() to be pedantic in the get_ngroups() version) properly on the systems we care about. sysconf() will probably work but is sub-optimal and (to be pedantic) potentially wrong.
SuSv3 says _SC_NGROUPS_MAX: "Maximum number of simultaneous supplementary group IDs per process". Sizing arrays using NGROUPS_MAX and/or sysconf is a very common idiom. Even the Linux manpages recommend this. Most systems don't even have anything like 64k groups, let alone accounts which are members of all of them. Have you not heard of optimising for the common case? What you propose means more complexity for every piece of downstream software that uses supplemental groups.
Well, there are very few bits of code that need hacking to work with 64k groups, so I have to discount the bit about extra complexity. Speaking of optimizing for the common case: this is called ONCE (unless I misread) per process. The real optimization is to use only as much memory as is strictly needed, though neither you nor I are actually optimizing anything at all. The runtime of this code is so far away from the fast path of anything that it's dumb to be arguing about. I should also mention that sooner or later _SC_NGROUPS_MAX may end up as an actual tunable in Linux. Again, you don't care what the maximum is, just what the actual number is. Further, since the patch(es) I proposed are VERY simple and work reliably, why would you opt AGAINST them, for something that is less precise AND might not be available on a platform (thereby falling back on today's buggy behavior). I can't see the reason for arguing that as a win. But, in the end, it's not my project, right?
Simpler patch applied. If necessary, we can revisit this under a different bug post-release if necessary. Tim, for a long time software has sized arrays using NGROUPS_MAX and/or sysconf. By changing NGROUPS_MAX, you break binary compat in scary ways. By making the baseline _SC_NGROUPS_MAX so high, you waste memory everywhere and force everyone else to do the work in cleaning up after you. Hopefully the glibc people will try to shelter userland from this silly and gratuitous change.
I did not find as many examples of people (mis)using NGROUPS_MAX as you'd like to believe. Customers demanded more groups. So it had to grow. As I keep arguing - you DON'T want to know the maximum groups (in 99% of cases). You want to know the ACTUAL groups. And I am doing what I can to find apps like openssh that are broken, and help them to see the light. Once it is fixed, it's fixed. Cheers.
Mass change of RESOLVED bugs to CLOSED