[PATCH v3 1/2] arm64: Support execute-only permissions with Enhanced PAN

Tue Jan 26 07:05:42 EST 2021

On Tue, Jan 26, 2021 at 11:03:06AM +0000, Will Deacon wrote:
> On Tue, Jan 19, 2021 at 04:07:22PM +0000, Vladimir Murzin wrote:
> >  #define pte_valid_not_user(pte) \
> > -	((pte_val(pte) & (PTE_VALID | PTE_USER)) == PTE_VALID)
> > +	((pte_val(pte) & (PTE_VALID | PTE_USER | PTE_UXN)) == (PTE_VALID | PTE_UXN))
> >  #define pte_valid_user(pte) \
> >  	((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER))
> 
> Won't pte_valid_user() go wrong for exec-only mappings now?

I wondered about the same in November (and reproducing my comment
below):

https://lore.kernel.org/linux-arm-kernel/20201119182236.GN4376@gaia/

pte_valid_user() checks for both PTE_VALID and PTE_USER. In theory, a
!PTE_UXN is also user-accessible but it's only used in gup_pte_range()
via pte_access_permitted(). If "access" in the latter means only
read/write, we should be ok. If we keep it as is, a comment would
be useful.

> > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> > index 8b5e7e5..47e9fdc 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -591,6 +591,7 @@
> >  	(SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
> >  
> >  /* SCTLR_EL1 specific flags. */
> > +#define SCTLR_EL1_EPAN		(BIT(57))
> >  #define SCTLR_EL1_ATA0		(BIT(42))
> >  
> >  #define SCTLR_EL1_TCF0_SHIFT	38
> > @@ -631,7 +632,7 @@
> >  	 SCTLR_EL1_SED  | SCTLR_ELx_I    | SCTLR_EL1_DZE  | SCTLR_EL1_UCT   | \
> >  	 SCTLR_EL1_NTWE | SCTLR_ELx_IESB | SCTLR_EL1_SPAN | SCTLR_ELx_ITFSB | \
> >  	 SCTLR_ELx_ATA  | SCTLR_EL1_ATA0 | ENDIAN_SET_EL1 | SCTLR_EL1_UCI   | \
> > -	 SCTLR_EL1_RES1)
> > +	 SCTLR_EL1_EPAN | SCTLR_EL1_RES1)
> 
> Why is this handled differently to normal PAN, where the SCTLR is written in
> cpu_enable_pan()?

That's how it was done in an early version but I suggested we move it to
the default SCTLR bits to save some lines:

https://lore.kernel.org/linux-arm-kernel/20201202182303.GC21091@gaia/

With classic PAN, we only enable it if all the CPUs support it. For EPAN
that's not necessary since EPAN should depend on PAN being enabled.

It's also cached in the TLB, so it's a bit of a hassle with CnP. See
this past discussion:

https://lore.kernel.org/linux-arm-kernel/X7P+r%2Fl3ewvaf1aV@trantor/

> > diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> > index 3c40da4..c32095f6 100644
> > --- a/arch/arm64/mm/fault.c
> > +++ b/arch/arm64/mm/fault.c
> > @@ -529,6 +529,9 @@ static int __kprobes do_page_fault(unsigned long far, unsigned int esr,
> >  	if (faulthandler_disabled() || !mm)
> >  		goto no_context;
> >  
> > +	if (cpus_have_const_cap(ARM64_HAS_EPAN))
> > +		vm_flags &= ~VM_EXEC;
> 
> This needs a comment, and I think it would be cleaner to rework the vm_flags
> initialisation along the lines of:
> 
> 	unsigned long vm_flags;
> 	unsigned long mm_flags = FAULT_FLAG_DEFAULT;
> 
> 	if (user_mode(regs))
> 		mm_flags |= FAULT_FLAG_USER;
> 
> 	if (is_el0_instruction_abort(esr)) {
> 		vm_flags = VM_EXEC;
> 		mm_flags |= FAULT_FLAG_INSTRUCTION;
> 	} else if (is_write_abort(esr)) {
> 		vm_flags = VM_WRITE;
> 		mm_flags |= FAULT_FLAG_WRITE;
> 	} else {
> 		vm_flags = VM_READ | VM_WRITE;
> 		if (!cpus_have_const_cap(ARM64_HAS_EPAN))
> 			vm_flags |= VM_EXEC:
> 	}
> 
> (but again, please add a comment to that last part because I still don't
> really follow what you're doing)

Every time I come across this vm_flags handling I have to spend some
minutes to re-understand what it does. So vm_flags tells us what bits we
must have in vma->vm_flags for the fault to be benign. But we start with
all rwx flags and clear VM_EXEC if EPAN since exec does not imply read,
making it more confusing.

I think your rewrite is cleaner, maybe with some comment why we add
VM_EXEC when !EPAN.

-- 
Catalin