[PATCH v4 2/9] riscv: Restore the pfn in a NAPOT pte when manipulated by core mm code
Matthew Wilcox
willy at infradead.org
Mon Jan 27 05:51:34 PST 2025
On Mon, Jan 27, 2025 at 10:35:23AM +0100, Alexandre Ghiti wrote:
> +#ifdef CONFIG_RISCV_ISA_SVNAPOT
> +static inline void set_ptes(struct mm_struct *mm, unsigned long addr,
> + pte_t *ptep, pte_t pteval, unsigned int nr)
> +{
> + if (unlikely(pte_valid_napot(pteval))) {
> + unsigned int order = ilog2(nr);
> +
> + if (!is_napot_order(order)) {
> + /*
> + * Something's weird, we are given a NAPOT pte but the
No, nothing is weird. This can happen under a lot of different
circumstances. For example, one might mmap() part of a file and the
folio containing the data is only partially mapped. The filesystem /
page cache might choose to use a folio order that isn't one of your
magic hardware orders.
> + * size of the mapping is not a known NAPOT mapping
> + * size, so clear the NAPOT bit and map this without
> + * NAPOT support: core mm only manipulates pte with the
> + * real pfn so we know the pte is valid without the N
> + * bit.
> + */
> + pr_err("Incorrect NAPOT mapping, resetting.\n");
> + pteval = pte_clear_napot(pteval);
> + } else {
> + /*
> + * NAPOT ptes that arrive here only have the N bit set
> + * and their pfn does not contain the mapping size, so
> + * set that here.
> + */
> + pteval = pte_mknapot(pteval, order);
You're assuming that pteval is aligned to the order that you've
calculated, and again that's not true. For example, the user may have
called mmap() on range 0x21000-0x40000 of a file which is covered by
a 128kB folio. You'll be called with a pteval pointing to 0x21000 and
calculate that you can put a 64kB entry there ... no.
I'd suggest you do some testing with fstests and xfs as your underlying
filesystem. It should catch these kinds of mistakes.
More information about the linux-riscv
mailing list