[PATCH v1 1/2] arm64/mm: Move PTE_PROT_NONE and PMD_PRESENT_INVALID

Mon Apr 29 05:38:33 PDT 2024

On Mon, Apr 29, 2024 at 11:04:53AM +0100, Ryan Roberts wrote:
> On 26/04/2024 15:48, Catalin Marinas wrote:
> > On Thu, Apr 25, 2024 at 11:37:42AM +0100, Ryan Roberts wrote:
> >> Also, IMHO we shouldn't really need to reserve PMD_PRESENT_INVALID for swap
> >> ptes; it would be cleaner to have one bit that defines "present" when valid is
> >> clear (similar to PTE_PROT_NONE today) then another bit which is only defined
> >> when "present && !valid" which tells us if this is PTE_PROT_NONE or
> >> PMD_PRESENT_INVALID (I don't think you can ever have both at the same time?).
> > 
> > I think this make sense, maybe rename the above to PTE_PRESENT_INVALID
> > and use it for both ptes and pmds.
> 
> Yep, sounds good. I've already got a patch to do this, but it's exposed a bug in
> core-mm so will now fix that before I can validate my change. see
> https://lore.kernel.org/linux-arm-kernel/ZiuyGXt0XWwRgFh9@x1n/
> 
> With this in place, I'm proposing to remove PTE_PROT_NONE entirely and instead
> represent PROT_NONE as a present but invalid pte (PTE_VALID=0, PTE_INVALID=1)
> with both PTE_WRITE=0 and PTE_RDONLY=0.
> 
> While the HW would interpret PTE_WRITE=0/PTE_RDONLY=0 as "RW without dirty bit
> modification", this is not a problem as the pte is invalid, so the HW doesn't
> interpret it. And SW always uses the PTE_WRITE bit to interpret the writability
> of the pte. So PTE_WRITE=0/PTE_RDONLY=0 was previously an unused combination
> that we now repurpose for PROT_NONE.

Why not just keep the bits currently in PAGE_NONE (PTE_RDONLY would be
set) and check PTE_USER|PTE_UXN == 0b01 which is a unique combination
for PAGE_NONE (bar the kernel mappings).

For ptes, it doesn't matter, we can assume that PTE_PRESENT_INVALID
means pte_protnone(). For pmds, however, we can end up with
pmd_protnone(pmd_mkinvalid(pmd)) == true for any of the PAGE_*
permissions encoded into a valid pmd. That's where a dedicated
PTE_PROT_NONE bit helped.

Let's say a CPU starts splitting a pmd and does a pmdp_invalidate*()
first to set PTE_PRESENT_INVALID. A different CPU gets a fault and since
the pmd is present, it goes and checks pmd_protnone() which returns
true, ending up on do_huge_pmd_numa_page() path. Maybe some locks help
but it looks fragile to rely on them.

So I think for protnone we need to check some other bits (like USER and
UXN) in addition to PTE_PRESENT_INVALID.

> This will subtly change behaviour in an edge case though. Imagine:
> 
> pte_t pte;
> 
> pte = pte_modify(pte, PAGE_NONE);
> pte = pte_mkwrite_novma(pte);
> WARN_ON(pte_protnone(pte));
> 
> Should that warning fire or not? Previously, because we had a dedicated bit for
> PTE_PROT_NONE it would fire. With my proposed change it will not fire. To me
> it's more intuitive if it doesn't fire. Regardless there is no core code that
> ever does this. Once you have a protnone pte, its terminal - nothing ever
> modifies it with these helpers AFAICS.

I don't think any core code should try to make page a PAGE_NONE pte
writeable.

> Personally I think this is a nice tidy up that saves a SW bit in both present
> and swap ptes. What do you think? (I'll just post the series if its easier to
> provide feedback in that context).

It would be nice to tidy this up and get rid of PTE_PROT_NONE as long as
it doesn't affect the pmd case I mentioned above.

> >> But there is a problem with this: __split_huge_pmd_locked() calls
> >> pmdp_invalidate() for a pmd before it determines that it is pmd_present(). So
> >> the PMD_PRESENT_INVALID can be set in a swap pte today. That feels wrong to me,
> >> but was trying to avoid the whole thing unravelling so didn't persue.
> > 
> > Maybe what's wrong is the arm64 implementation setting this bit on a
> > swap/migration pmd (though we could handle this in the core code as
> > well, it depends what the other architectures do). The only check for
> > the PMD_PRESENT_INVALID bit is in the arm64 code and it can be absorbed
> > into the pmd_present() check. I think it is currently broken as
> > pmd_present() can return true for a swap pmd after pmd_mkinvalid().
> 
> I've posted a fix here:
> https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@arm.com/
> 
> My position is that you shouldn't be calling pmd_mkinvalid() on a non-present pmd.

I agree, thanks.

-- 
Catalin