[PATCH v2 1/3] arm64/mm: Refactor PMD_PRESENT_INVALID and PTE_PROT_NONE bits

Tue Apr 30 08:04:53 PDT 2024

On Tue, Apr 30, 2024 at 03:02:21PM +0100, Ryan Roberts wrote:
> On 30/04/2024 14:30, Will Deacon wrote:
> > On Mon, Apr 29, 2024 at 03:02:05PM +0100, Ryan Roberts wrote:
> >> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> >> index dd9ee67d1d87..de62e6881154 100644
> >> --- a/arch/arm64/include/asm/pgtable-prot.h
> >> +++ b/arch/arm64/include/asm/pgtable-prot.h
> >> @@ -18,14 +18,7 @@
> >>  #define PTE_DIRTY		(_AT(pteval_t, 1) << 55)
> >>  #define PTE_SPECIAL		(_AT(pteval_t, 1) << 56)
> >>  #define PTE_DEVMAP		(_AT(pteval_t, 1) << 57)
> >> -#define PTE_PROT_NONE		(_AT(pteval_t, 1) << 58) /* only when !PTE_VALID */
> >> -
> >> -/*
> >> - * This bit indicates that the entry is present i.e. pmd_page()
> >> - * still points to a valid huge page in memory even if the pmd
> >> - * has been invalidated.
> >> - */
> >> -#define PMD_PRESENT_INVALID	(_AT(pteval_t, 1) << 59) /* only when !PMD_SECT_VALID */
> >> +#define PTE_INVALID		(_AT(pteval_t, 1) << 59) /* only when !PTE_VALID */
> > 
> > So this now overlaps with AttrIndx[3] if FEAT_AIE is implemented. Although
> > this shouldn't matter on the face of things because it's only used for
> > invalid entries, we originally moved the PROT_NONE bit from 2 to 57 back
> > in 3676f9ef5481 ("arm64: Move PTE_PROT_NONE higher up") because it was
> > possible to change the memory type for PROT_NONE mappings via some
> > drivers.
> 
> I'm not sure I follow your argument.
> 
>  1. We don't support FEAT_AIE (currently) so AttrIndx[3] is always going to be 0
> for valid ptes. Drivers are only calling our helpers (e.g.
> pgprot_writecombine(), right?) and those only know how to set AttrIndx[2:0].

Sure, but we might want to use it in future and chucking that out for the
sake of uffd doesn't seem like an obviously worthwhile trade-off to me.

>  2. PMD_PRESENT_INVALID was already occupying bit 59. So wouldn't the same shape
> of concern apply there too for PMDs that have been invalidated, where the driver
> then comes along and changes the memory type? (Perhaps because
> PMD_PRESENT_INVALID is only set while the PTL is held this can't happen).

I was mainly thinking of the PROT_NONE case, to be honest with you. I
struggle to envisage how a driver could sensibly mess with the memory
type for anonymous mappings, let alone huge pages! But perhaps I just
lack imagination :)

>  3. I had this same vague concern about confusion due to overlapping bit 59,
> which is why in the next patch, I'm moving it to the NS bit.
> 
> Experience tells me that when I'm arguing confidently with someone who is much
> more expert than me, then I'm using wrong... so what have I missed? :)
> 
> > 
> > Moving the field to the NS bit (as you do later in the series) resolves
> > this, but the architecture currently says that the NS bit is RES0. How
> > can we guarantee that it won't be repurposed by hardware in future?
> 
> Well it remains free for use in valid entries of course.

I think that's what I'm actually questioning! RES0 doesn't mean that
tomorrow's whizz-bang CPU extension isn't allowed to use it, but that's
a guarantee that we need if we're going to use it for our own purposes.

> So I guess you are asking how to guarantee we won't also need to be able
> to modify it on the fly for PROT_NONE entries? I don't have a definite
> answer, but I've been working on the assumption that the architecture
> introducing a feature that is only needed in states where NS is not needed
> is unlikely (so using that bit for the feature is also unlikely). And then
> needing to manipulate that feature dyanically for PROT_NONE mappings is
> even less likely.

The architects are quite good at inventing unlikely features :) SVE
blowing the sigcontext comes to mind. I think we should seek
clarification that the NS bit won't be allocated in the future if we are
going to use it for our own stuff.

> If all else fails we could move it to nG (bit 11) to free up bit 5. But that
> requires a bit more fiddling with the swap pte format.

Oh, cunning, I hadn't thought of that. I think that's probably a better
approach if the NS bit isn't guaranteed to be left alone by the
architecture.

> >> @@ -469,7 +477,7 @@ static inline pte_t pte_swp_clear_exclusive(pte_t pte)
> >>   */
> >>  static inline int pte_protnone(pte_t pte)
> >>  {
> >> -	return (pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)) == PTE_PROT_NONE;
> >> +	return pte_invalid(pte) && !pte_user(pte) && !pte_user_exec(pte);
> >>  }
> > 
> > Why do we need to check pte_user_*() here? Isn't PROT_NONE the only case
> > in which a pte will have PTE_INVALID set?
> 
> I guess for *ptes* this is technically correct. But I was trying to make the
> format generic and reusable for *pmds* too. (pmd_protnone() wraps
> pte_protnone()). For pmds, PTE_INVALID also represents invalid-but-present PMDs
> (i.e. pmds on which pmd_mkinvalid() has been called).
> 
> The intention is that PTE_INVALID indicates "present but not valid in HW". And
> (!pte_user(pte) && !pte_user_exec(pte)) indicates the PROT_NONE permission.

Ok, but it does mean the compiler can't emit a nice TBNZ instruction for
the pte macro. Can you either seperate out the pmd/pte versions of the
macro or just add a comment along the lines of what you said above, please?

Cheers,

Will