[PATCH v8 42/43] mm: add arch hook to validate mmap() prot flags

Catalin Marinas catalin.marinas at arm.com
Wed Mar 13 03:47:36 PDT 2024


On Wed, Mar 13, 2024 at 12:23:24AM +0100, Ard Biesheuvel wrote:
> On Tue, 12 Mar 2024 at 20:53, Catalin Marinas <catalin.marinas at arm.com> wrote:
> > On Wed, Feb 14, 2024 at 01:29:28PM +0100, Ard Biesheuvel wrote:
> > > diff --git a/mm/mmap.c b/mm/mmap.c
> > > index d89770eaab6b..977a8c3fd9f5 100644
> > > --- a/mm/mmap.c
> > > +++ b/mm/mmap.c
> > > @@ -1229,6 +1229,9 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> > >               if (!(file && path_noexec(&file->f_path)))
> > >                       prot |= PROT_EXEC;
> > >
> > > +     if (!arch_validate_mmap_prot(prot, addr))
> > > +             return -EACCES;
> > > +
> > >       /* force arch specific MAP_FIXED handling in get_unmapped_area */
> > >       if (flags & MAP_FIXED_NOREPLACE)
> > >               flags |= MAP_FIXED;
> >
> > While writing the pull request for Linus (and looking to justify this
> > change), I realised that we already have arch_validate_flags() that can
> > do a similar check but on VM_* flags instead of PROT_* (we use it for
> > VM_MTE checks). What was the reason for adding a new hook?
[...]
> > The only
> > difference is that here it returns -EACCESS while on
> > arch_validate_flags() failure it would return -EINVAL. It probably makes
> > more to return -EACCESS as it matches map_deny_write_exec() but with
> > your patches we are inconsistent between mmap() and mprotect() errors
> > (the latter is still -EINVAL).
> 
> Yes. Looking at it now, it would be better to add a single arch hook
> to map_deny_write_exec(), and use that instead.

This would work and matches the API better. Another way to look at the
arm64 WXN feature is to avoid bothering with with the checks knowing
that the hardware enforces XN when a permission is W. So it can be seen
as a choice irrespective of the user PROT_EXEC|PROT_WRITE permission.
But it's still an ABI change, so I guess better to be upfront with the
user and reject such mmap()/mprotect() permission combinations.

However, I've been looking through specs and realised that SCTLR_ELx.WXN
is RES0 when the permission indirection is enabled (FEAT_PIE from the
2022 specs, hopefully you have access to it). And while apparently WXN
gets better as it allows separate EL0/EL1 controls, it seems to only
apply when the base permission is RWX and the XN is toggled based on the
overlay permission (pkeys which Joey is working on). So it looks like
what the architects had in mind is optimising RW/RX switching via
overlays (no syscalls) but keeping the base permission RWX. The
traditional WXN hardening via SCTLR_EL1 disappeared.

(adding Joey to the thread, he contributed the PIE support)

> > It also got me thinking on whether we could use this as a hardened
> > version of the MDWE feature instead a CPU feature (though we'd end up
> > context-switching this SCTLR_EL1 bit). I think your patches have been
> > around before the MDWE feature was added to the kernel.
> 
> Indeed.
> 
> MDWE looks like a good match in principle, but combining the two seems tricky:
> - EL1 is going to flip between WXN protection on and off depending on
> which setting it is using for EL0;
> - context switching SCTLR_EL1.WXN may become costly in terms of TLB
> maintenance, unless we cheat and ignore the kernel mappings (which
> should work as expected regardless of the value of SCTLR_EL1.WXN);
> 
> If we can find a reasonable strategy to manage the TLB maintenance
> that does not leave EL1 behind, I'm happy to explore this further. But
> perhaps WXN should simply be treated as MDWE always-on for all
> processes.

Ah, I did not realise this bit can be cached in the TLB. So this doesn't
really work (and the Arm ARM is also vague about whether it's cached per
ASID or not).

> > Sorry for not catching this early.
> 
> No worries - it's more likely to be useful if we get this right.

Given that with PIE this feature no longer works, I'll revert these two
patches for now. We should revisit it in combination with PIE and POE
(overlays) but it does look like the semantics are slightly different
(unless I misread the specs). If we want a global MDWE=on on the command
line, this can be generic.

-- 
Catalin



More information about the linux-arm-kernel mailing list