[PATCH] arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults
Piotr Jaroszynski
pjaroszynski at nvidia.com
Tue Mar 3 10:40:00 PST 2026
On Tue, Mar 03, 2026 at 08:38:23AM +0000, Ryan Roberts wrote:
> On 03/03/2026 06:37, Piotr Jaroszynski wrote:
> > contpte_ptep_set_access_flags() compared the gathered ptep_get() value
> > against the requested entry to detect no-ops. ptep_get() ORs AF/dirty
> > from all sub-PTEs in the CONT block, so a dirty sibling can make the
> > target appear already-dirty. When the gathered value matches entry, the
> > function returns 0 even though the target sub-PTE still has PTE_RDONLY
> > set in hardware.
> >
> > For CPU page-table walks this is benign: with FEAT_HAFDBS the hardware
> > may set AF/dirty on any sub-PTE and the CPU TLB treats the gathered
> > result as authoritative for the entire range. But an SMMU without HTTU
> > (or with HA/HD disabled in CD.TCR) evaluates each descriptor
> > individually and will keep raising F_PERMISSION on the unchanged target
> > sub-PTE, causing an infinite fault loop.
>
> Ouch; thanks for the fix!
>
> >
> > Gathering can therefore cause false no-ops when only a sibling has been
> > updated:
> > - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
> > - read faults: target still lacks PTE_AF
> >
> > Fix by checking all sub-PTEs' access flags individually (not via the
> > gathered view) before returning no-op, and use the raw target PTE for
> > the write-bit unfold decision. The access-flag mask matches the one
> > used by __ptep_set_access_flags().
> >
> > Per Arm ARM (DDI 0487) D8.7.1 ("The Contiguous bit"), any sub-PTE in a CONT
> > range may become the effective cached translation and software must
> > maintain consistent attributes across the range.
> >
> > Fixes: 4602e5757bcc ("arm64/mm: wire up PTE_CONT for user mappings")
> >
>
> nit: there shouldn't be whitespace here.
>
> > Reviewed-by: Alistair Popple <apopple at nvidia.com>
> > Cc: Ryan Roberts <ryan.roberts at arm.com>
> > Cc: Catalin Marinas <catalin.marinas at arm.com>
> > Cc: Will Deacon <will at kernel.org>
> > Cc: Jason Gunthorpe <jgg at nvidia.com>
> > Cc: John Hubbard <jhubbard at nvidia.com>
> > Cc: Zi Yan <ziy at nvidia.com>
> > Cc: Breno Leitao <leitao at debian.org>
> > Cc: stable at vger.kernel.org
> > Signed-off-by: Piotr Jaroszynski <pjaroszynski at nvidia.com>
>
> This fix looks good to me:
>
> Reviewed-by: Ryan Roberts <ryan.roberts at arm.com>
Thanks!
>
>
> > ---
> > arch/arm64/mm/contpte.c | 47 +++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 43 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> > index bcac4f55f9c1..9868bfe4607c 100644
> > --- a/arch/arm64/mm/contpte.c
> > +++ b/arch/arm64/mm/contpte.c
> > @@ -390,6 +390,23 @@ void contpte_clear_young_dirty_ptes(struct vm_area_struct *vma,
> > }
> > EXPORT_SYMBOL_GPL(contpte_clear_young_dirty_ptes);
> >
> > +static bool contpte_all_subptes_match_access_flags(pte_t *ptep, pte_t entry)
> > +{
> > + pte_t *cont_ptep = contpte_align_down(ptep);
> > + const pteval_t access_mask = PTE_RDONLY | PTE_AF | PTE_WRITE | PTE_DIRTY;
> > + pteval_t entry_access = pte_val(entry) & access_mask;
> > + int i;
> > +
> > + for (i = 0; i < CONT_PTES; i++) {
> > + pteval_t pte_access = pte_val(__ptep_get(cont_ptep + i)) & access_mask;
> > +
> > + if (pte_access != entry_access)
> > + return false;
> > + }
>
> There are 2 forms of "dirty"; HW and SW. Here you are testing that all ptes in
> the contpte block have the same form of dirty, which I think is the correct
> thing to do. You could relax to just test that every pte has one of the forms of
> dirty, But in that case, if a pte is sw-dirty but not hw-dirty, then the
> PTE_RDONLY bit remains set and the SMMU will fault, I think?
Yes, for SMMU we need to make sure the HW-dirty state is correct or it
will keep faulting. And while we are doing it, it makes sense to just
update all the bits to be consistent.
>
> If my reasoning is correct, then I think arm64 hugetlb has a similar bug; See
> __cont_access_flags_changed(), which just checks for any form of dirty. So I
> guess hugetlb is buggy in the same way and should be fixed to use this more
> stringent approach?
Given sw-dirty is managed by sw, is it correct for sw to ever create a
PTE that's sw-dirty but not hw-dirty? If not, then I think it will still
work fine for the SMMU case as sw-dirty implies hw-dirty, and if it's
missing then we will set both. But for thoroughness it could make sense
to be stricter and add some comments there as it does feel a little
fragile. I'm very new to this area though so probably best for others to
comment and tackle this.
Thanks,
Piotr
>
> Thanks,
> Ryan
>
> > +
> > + return true;
> > +}
> > +
> > int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
> > unsigned long addr, pte_t *ptep,
> > pte_t entry, int dirty)
> > @@ -399,13 +416,35 @@ int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
> > int i;
> >
> > /*
> > - * Gather the access/dirty bits for the contiguous range. If nothing has
> > - * changed, its a noop.
> > + * Check whether all sub-PTEs in the CONT block already have the
> > + * requested access flags, using raw per-PTE values rather than the
> > + * gathered ptep_get() view.
> > + *
> > + * ptep_get() gathers AF/dirty state across the whole CONT block,
> > + * which is correct for CPU TLB semantics: with FEAT_HAFDBS the
> > + * hardware may set AF/dirty on any sub-PTE and the CPU TLB treats
> > + * the gathered result as authoritative for the entire range. But an
> > + * SMMU without HTTU (or with HA/HD disabled in CD.TCR) evaluates
> > + * each descriptor individually and will keep faulting on the target
> > + * sub-PTE if its flags haven't actually been updated. Gathering can
> > + * therefore cause false no-ops when only a sibling has been updated:
> > + * - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
> > + * - read faults: target still lacks PTE_AF
> > + *
> > + * Per Arm ARM (DDI 0487) D8.7.1, any sub-PTE in a CONT range may
> > + * become the effective cached translation, so all entries must have
> > + * consistent attributes. Check the full CONT block before returning
> > + * no-op, and when any sub-PTE mismatches, proceed to update the whole
> > + * range.
> > */
> > - orig_pte = pte_mknoncont(ptep_get(ptep));
> > - if (pte_val(orig_pte) == pte_val(entry))
> > + if (contpte_all_subptes_match_access_flags(ptep, entry))
> > return 0;
> >
> > + /*
> > + * Use raw target pte (not gathered) for write-bit unfold decision.
> > + */
> > + orig_pte = pte_mknoncont(__ptep_get(ptep));
> > +
> > /*
> > * We can fix up access/dirty bits without having to unfold the contig
> > * range. But if the write bit is changing, we must unfold.
>
>
More information about the linux-arm-kernel
mailing list