Bug 2361 - seccomp filter (not only) for aarch64
Summary: seccomp filter (not only) for aarch64
Status: CLOSED FIXED
Alias: None
Product: Portable OpenSSH
Classification: Unclassified
Component: sshd (show other bugs)
Version: 6.7p1
Hardware: Other Linux
: P5 enhancement
Assignee: Damien Miller
URL:
Keywords:
Depends on:
Blocks: V_6_9
  Show dependency treegraph
 
Reported: 2015-03-06 03:10 AEDT by Jakub Jelen
Modified: 2016-08-02 10:41 AEST (History)
3 users (show)

See Also:


Attachments
aarh64 patch (1.78 KB, patch)
2015-03-06 03:10 AEDT, Jakub Jelen
no flags Details | Diff
flatten and sort syscall ACL (2.85 KB, patch)
2015-04-24 13:36 AEST, Damien Miller
dtucker: ok+
Details | Diff
proposed patch - stat and shutdown on ix68 (1.17 KB, text/plain)
2015-06-16 20:00 AEST, Jakub Jelen
no flags Details
use a macro for socketcall test (1.57 KB, patch)
2015-06-16 21:34 AEST, Damien Miller
no flags Details | Diff
tweaked diff (1.68 KB, patch)
2015-06-16 23:21 AEST, Damien Miller
dtucker: ok+
Details | Diff
with aarch64 bits, re-sorted syscall list (4.16 KB, patch)
2015-06-17 08:47 AEST, Damien Miller
dtucker: ok+
Details | Diff
jump past accumulator reload (569 bytes, patch)
2015-06-17 11:09 AEST, Damien Miller
dtucker: ok+
Details | Diff
missing pselect6 (473 bytes, patch)
2015-06-24 19:57 AEST, Jakub Jelen
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jakub Jelen 2015-03-06 03:10:19 AEDT
Created attachment 2561 [details]
aarh64 patch

We started using seccomp filter in openssh and there appeared to some problems with secondary architectures:
https://bugzilla.redhat.com/show_bug.cgi?id=1195065

Seccomp filter is available on aarch64 architecture, but openssh code was not ready for it so I am providing here patch to make it working.

Changes and explanations:
 * First of all we need to whitelist this architecture in configure.ac
  * (also fixing some indentation inconsistency around arm)
 * Then we need to adjust filter settings for syscalls denial
  * (if syscall doesn't exist openssh will not build)
  * open is not on aarch64, openat exists also on primary architectures
  * stat is never used, x86_64 is using fstat, ix86 and arm is using fstat64 and stat64 => whitelisting, aarch64 is using fstat and newfstatat
  * poll, select are not available on aarch64
  * pselect6 is used instead of select
 (see attached patch)

This patch was tested and is currently used in Fedora.
We plan to add support for other architectures sooner or later. Further discussion welcome.



I'm also appending table with syscall names and numbers I collected during my testing and which are mentioned in filter and differ across architectures:

          open       stat                       select()
x86_64   open(2)    fstat(5)                   select(23)
 ix86    open(5)    stat64(195)  fstat64(197)  _newselect(142)
 arm     open(5)    stat64(195)  fstat64(197)  _newselect(142)
aarch64  openat(56) fstat(80) newfstatat(79)   pselect6(72)

Feel free to commend or add more syscalls you are interested in.
Comment 1 Dmitry V. Levin 2015-03-06 03:35:27 AEDT
Wouldn't it be better if

+#if defined(__NR_stat64) && defined(__NR_fstat64)
+	SC_DENY(stat64, EACCES), /* ix86, arm */
+	SC_DENY(fstat64, EACCES),
+#endif

was written as

+#ifdef __NR_stat64
+	SC_DENY(stat64, EACCES), /* ix86, arm */
+#endif
+#ifdef __NR_fstat64
+	SC_DENY(fstat64, EACCES), /* ix86, arm */
+#endif

?
Comment 2 Jakub Jelen 2015-03-06 03:39:03 AEDT
Yes, it would be better, but
 1) These two calls are on same arch together so I wanted to point this out.
 2) There is enough of ifdefs from my point of view in this file.

But I don't mind if you will apply it the other way.
Comment 3 Damien Miller 2015-04-22 11:40:20 AEST
Comment on attachment 2561 [details]
aarh64 patch

This looks ok to me. Darren, any comments?
Comment 4 Darren Tucker 2015-04-22 12:01:23 AEST
Comment on attachment 2561 [details]
aarh64 patch

>+#ifdef __NR_select /* not on AArch64 */
> 	SC_ALLOW(select),
> #endif
>+#ifdef __NR_pselect6 /* AArch64 */
>+	SC_ALLOW(pselect6),
>+#endif
>+#endif

This nesting looks wrong and it's getting messy.

Could we put the __NR_$thing test inside the SC_ALLOW/SC_DENY macros?
Comment 5 Damien Miller 2015-04-22 12:17:25 AEST
(In reply to Darren Tucker from comment #4)
> Comment on attachment 2561 [details]
> aarh64 patch
> 
> >+#ifdef __NR_select /* not on AArch64 */
> > 	SC_ALLOW(select),
> > #endif
> >+#ifdef __NR_pselect6 /* AArch64 */
> >+	SC_ALLOW(pselect6),
> >+#endif
> >+#endif
> 
> This nesting looks wrong and it's getting messy.

I can reindent, which makes it a bit clearer

#ifdef __NR__newselect
        SC_ALLOW(_newselect),
#else
# ifdef __NR_select /* not on AArch64 */
        SC_ALLOW(select),
# endif
# ifdef __NR_pselect6 /* AArch64 */
        SC_ALLOW(pselect6),
# endif
#endif

Though maybe it is just better to allow each syscall based on its own presence:

#ifdef __NR__newselect
        SC_ALLOW(_newselect),
#endif
#ifdef __NR_select /* not on AArch64 */
        SC_ALLOW(select),
#endif
#ifdef __NR_pselect6 /* AArch64 */
        SC_ALLOW(pselect6),
#endif

> Could we put the __NR_$thing test inside the SC_ALLOW/SC_DENY macros?

How would this work? You can't have #if/#ifdef inside a #define
Comment 6 Darren Tucker 2015-04-22 12:59:56 AEST
(In reply to Damien Miller from comment #5) 
> I can reindent, which makes it a bit clearer

No, I think it'll still be wrong:

$ lynx -source 'https://bugzilla.mindrot.org/attachment.cgi?id=2561&action=diff&context=patch&collapsed=&headers=1&format=raw' | egrep '^\+' | awk '/ifdef|endif/{print $1}' | sort | uniq -c
      7 +#endif
      6 +#ifdef

(there are no "-" lines)

> How would this work? You can't have #if/#ifdef inside a #define

I dunno.  It might not be possible (but if it is, it'd be a lot cleared and somewhat more future-proof).
Comment 7 Darren Tucker 2015-04-22 15:59:30 AEST
(In reply to Darren Tucker from comment #6)
> I dunno.  It might not be possible.

I don't think it is possible to do this, at least via any mechanism I'd be willing to support :-)


(In reply to Damien Miller from comment #5)
> Though maybe it is just better to allow each syscall based on its
> own presence:

I think that's a better idea.  Although it's a bit ugly, IMO having a consistent pattern is easier to spot errors in and less ugly than having some inside ifdefs and some not.
Comment 8 Jakub Jelen 2015-04-22 16:41:33 AEST
Thanks for coming back to this.

I was trying to create it in some way it will be less ugly, but I didn't find any better solution than having every SC_ALLOW covered with ifdef. I agree with dividing that select ifdefs according to presence. It doesn't hurt anything and would be much more clear.

This construction discussed from comment #4 is coming from patch [1] by Marcin, which I had to rewrite little bit, but I left this one, because it was working.

To make it less uggly, it would be possible to use more high-level library libseccomp [2], which solves such problems under the hood, but I'm not sure about support on different platforms.

[1] https://bugzilla.redhat.com/attachment.cgi?id=994437&action=diff
[2] https://github.com/seccomp/libseccomp
Comment 9 Darren Tucker 2015-04-22 19:30:42 AEST
(In reply to Jakub Jelen from comment #8)
> To make it less ugly, it would be possible to use more high-level
> library libseccomp [2]

Given the use case is relatively simple I think we'd rather take a few extra lines of mechanical code than pull in another library dependency.
Comment 10 Damien Miller 2015-04-24 13:36:35 AEST
Created attachment 2601 [details]
flatten and sort syscall ACL

This flattens out the syscall tests in the ACL to make them independent and sorts them to make them a bit more maintainable.

I agree with Darren wrt libseccomp - our requirements are very simple and we don't want to have to manage another library dependency.
Comment 11 Darren Tucker 2015-04-24 15:17:44 AEST
Comment on attachment 2601 [details]
flatten and sort syscall ACL

diff looks OK but I'd like confirmation it actually works on the target hardware.
Comment 12 Darren Tucker 2015-04-24 15:18:04 AEST
Comment on attachment 2601 [details]
flatten and sort syscall ACL

diff looks OK but I'd like confirmation it actually works on the target hardware.
Comment 13 Damien Miller 2015-04-24 15:27:23 AEST
applied - this will be in openssh-6.9
Comment 14 Jakub Jelen 2015-06-16 20:00:30 AEST
Created attachment 2648 [details]
proposed patch - stat and shutdown on ix68

Some further catch ups:
 1) stat syscall is not as legacy as expected -- gsssapi library issues such syscall even on x86_64 (based on [1]). Proposing to add back also stat to make sure everything works.

 2) Socket shutdown is handled by socketcall on i386 linux so we are getting "socket closed" errors instead of correct closing connection. Audit messages:

Jun 16 09:27:51 host audit[11004]: SECCOMP auid=4294967295 uid=74 gid=74 ses=4294967295 subj=system_u:system_r:sshd_net_t:s0-s0:c0.c1023 pid=11004 comm="sshd" exe="/usr/sbin/sshd" sig=31 arch=40000003 syscall=102 compat=0 ip=0xb77d5be8 code=0x0
Jun 16 09:27:51 host sshd[11003]: error: mm_request_receive: socket closed
# ausyscall 102
socketcall

We don't want to allow all the syscalls [2] from socketcall. Best would be to allow only SYS_SHUTDOWN as first argument but in the current code of seccomp filter, there is no possibility to filter through function arguments.
Adding so would require additional complexity, but it would be great to have it "right way", even if it doesn't matter much during connection close.
See proposed patch with hand-baked seccomp filter for first argument check. I don't see the changes from previous comments in portable repository so the patch is not directly applicable (stat part).

Tested on Fedora 22 with openssh-6.8 and after applying this patch, I no longer see SECCOMP messages.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1228323
[2] http://jkukunas.blogspot.cz/2010/05/x86-linux-networking-system-calls.html
Comment 15 Damien Miller 2015-06-16 21:34:39 AEST
Created attachment 2649 [details]
use a macro for socketcall test

IMO it would be better to make the test into another macro; it's more readable and might be handy if we need to deal with similar calls in the future.
Comment 16 Jakub Jelen 2015-06-16 21:47:32 AEST
Comment on attachment 2649 [details]
use a macro for socketcall test

Thank you for fast reply.
Yes. It would be better for readability. Additionally we can parametrize the argument number to make it more generic:
>#define SC_ALLOW_ARG(_nr, _argn, _arg)
[...]
>	BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \
>	    offsetof(struct seccomp_data, args[_argn])), \

Additionally I don't know what happened to the stat()-thing -- if the previous patch was applied upstream, there is stat missing, also the fstat needs to be conditional if it is not on some other architectures.
Comment 17 Damien Miller 2015-06-16 23:21:45 AEST
Created attachment 2650 [details]
tweaked diff

We already have stat in the fail-with-EACCESS list. We didn't have fstat though - is fstat not a syscall on some architectures?

I made the syscall argument number a macro argument as you suggested. Could you try this diff? I don't have an i386 system with seccomp-bpf handy...
Comment 18 Jakub Jelen 2015-06-17 00:31:46 AEST
Damien, your diff works for me.

Can you make sure where ended the aarch64 diff which you marked as applied? I don't see it in portable git repo.

Currently I don't have around arm architectures so I can't make sure, but I believe stat is not on aarch64 based on this attachment [1]. Syscall fstat will is probably missing on arm architecture, but I can't test now.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1195065#c2
Comment 19 Darren Tucker 2015-06-17 08:21:03 AEST
Comment on attachment 2650 [details]
tweaked diff

Looks plausible although I know nothing of BPF.
Comment 20 Damien Miller 2015-06-17 08:47:20 AEST
Created attachment 2651 [details]
with aarch64 bits, re-sorted syscall list

This adds the aarch64 syscalls and configure stanza.

It also re-sorts the syscall list, and make it just include every requested syscall if their syscall numbers are present, rather than depending on architecture or libc. This means we might allow one or two additional syscalls than are strictly necessary (e.g. madvise on !musllibc) but it's more readable and easier to maintain.
Comment 21 Dmitry V. Levin 2015-06-17 10:22:02 AEST
(In reply to Damien Miller from comment #20)
> Created attachment 2651 [details]
> with aarch64 bits, re-sorted syscall list

Shouldn't the "jump false" argument of the first BPF_JUMP in SC_ALLOW_ARG's definition be changed from 3 to 4?
Comment 22 Damien Miller 2015-06-17 11:06:01 AEST
(In reply to Dmitry V. Levin from comment #21)
> Shouldn't the "jump false" argument of the first BPF_JUMP in
> SC_ALLOW_ARG's definition be changed from 3 to 4?

It could, at present it jumps to reloading the syscall number into the accumulator. Jumping past it could save an instruction.
Comment 23 Damien Miller 2015-06-17 11:09:30 AEST
Created attachment 2652 [details]
jump past accumulator reload
Comment 24 Darren Tucker 2015-06-17 14:19:30 AEST
Comment on attachment 2652 [details]
jump past accumulator reload

LGTM based on my approximately zero knowledge of BPF.
Comment 25 Jakub Jelen 2015-06-24 19:57:15 AEST
Created attachment 2655 [details]
missing pselect6

Just wanted to thank you that everything is applied in git and I was just getting to test it, but now I found that pselect6 is missing from the filter even though it was in the first patch. Can you add also this one?
Comment 26 Damien Miller 2015-06-25 09:53:15 AEST
Applied - thanks
Comment 27 Damien Miller 2016-08-02 10:41:33 AEST
Close all resolved bugs after 7.3p1 release