[PATCH] mm: Remove PFN_MAP, PFN_SPECIAL, PFN_SG_CHAIN and PFN_SG_LAST
David Hildenbrand
david at redhat.com
Wed Jun 11 01:23:20 PDT 2025
On 11.06.25 10:03, Marek Szyprowski wrote:
> On 11.06.2025 04:38, Alistair Popple wrote:
>> On Tue, Jun 10, 2025 at 06:18:09PM +0200, Marek Szyprowski wrote:
>>> On 04.06.2025 05:21, Alistair Popple wrote:
>>>> The PFN_MAP flag is no longer used for anything, so remove it.
>>>> The PFN_SG_CHAIN and PFN_SG_LAST flags never appear to have been
>>>> used so also remove them. The last user of PFN_SPECIAL was removed
>>>> by 653d7825c149 ("dcssblk: mark DAX broken, remove FS_DAX_LIMITED
>>>> support").
>>>>
>>>> Signed-off-by: Alistair Popple <apopple at nvidia.com>
>>>> Acked-by: David Hildenbrand <david at redhat.com>
>>>> Reviewed-by: Christoph Hellwig <hch at lst.de>
>>>> Reviewed-by: Jason Gunthorpe <jgg at nvidia.com>
>>>> Cc: gerald.schaefer at linux.ibm.com
>>>> Cc: dan.j.williams at intel.com
>>>> Cc: jgg at ziepe.ca
>>>> Cc: willy at infradead.org
>>>> Cc: david at redhat.com
>>>> Cc: linux-kernel at vger.kernel.org
>>>> Cc: nvdimm at lists.linux.dev
>>>> Cc: jhubbard at nvidia.com
>>>> Cc: hch at lst.de
>>>> Cc: zhang.lyra at gmail.com
>>>> Cc: debug at rivosinc.com
>>>> Cc: bjorn at kernel.org
>>>> Cc: balbirs at nvidia.com
>>>> Cc: lorenzo.stoakes at oracle.com
>>>> Cc: John at Groves.net
>>>>
>>>> ---
>>>>
>>>> Splitting this off from the rest of my series[1] as a separate clean-up
>>>> for consideration for the v6.16 merge window as suggested by Christoph.
>>>>
>>>> [1] - https://lore.kernel.org/linux-mm/cover.541c2702181b7461b84f1a6967a3f0e823023fcc.1748500293.git-series.apopple@nvidia.com/
>>>> ---
>>>> include/linux/pfn_t.h | 31 +++----------------------------
>>>> mm/memory.c | 2 --
>>>> tools/testing/nvdimm/test/iomap.c | 4 ----
>>>> 3 files changed, 3 insertions(+), 34 deletions(-)
>>> This patch landed in today's linux-next as commit 28be5676b4a3 ("mm:
>>> remove PFN_MAP, PFN_SPECIAL, PFN_SG_CHAIN and PFN_SG_LAST"). In my tests
>>> I've noticed that it breaks operation of all RISC-V 64bit boards on my
>>> test farm (VisionFive2, BananaPiF3 as well as QEMU's Virt machine). I've
>>> isolated the changes responsible for this issue, see the inline comments
>>> in the patch below. Here is an example of the issues observed in the
>>> logs from those machines:
>> Thanks for the report. I'm really confused by this because this change should
>> just be removal of dead code - nothing sets any of the removed PFN_* flags
>> AFAICT.
>>
>> I don't have access to any RISC-V hardwdare but you say this reproduces under
>> qemu - what do you run on the system to cause the error? Is it just a simple
>> boot and load a module or are you running selftests or something else?
>
> It fails a simple boot test. Here is a detailed instruction how to
> reproduce this issue with the random Debian rootfs image found on the
> internet (tested on Ubuntu 22.04, with next-20250610
> kernel source):
riscv is one of the archs where pte_mkdevmap() will *not* set the pte as special. (I
raised this recently in the original series, it's all a big mess)
So, before this change here, pfn_t_devmap() would have returned "false" if only
PFN_DEV was set, now it would return "true" if only PFN_DEV is set.
Consequently, in insert_pfn() we would have done a pte_mkspecial(), now we do a
pte_mkdevmap() -- again, which does not imply "special" on riscv.
riscv selects CONFIG_ARCH_HAS_PTE_SPECIAL, so if !pte_special(), it's considered as
normal.
Would the following fix your issue?
diff --git a/mm/memory.c b/mm/memory.c
index 8eba595056fe3..0e972c3493692 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -589,6 +589,10 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
{
unsigned long pfn = pte_pfn(pte);
+ /* TODO: remove this crap and set pte_special() instead. */
+ if (pte_devmap(pte))
+ return NULL;
+
if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) {
if (likely(!pte_special(pte)))
goto check_pfn;
@@ -598,16 +602,6 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
return NULL;
if (is_zero_pfn(pfn))
return NULL;
- if (pte_devmap(pte))
- /*
- * NOTE: New users of ZONE_DEVICE will not set pte_devmap()
- * and will have refcounts incremented on their struct pages
- * when they are inserted into PTEs, thus they are safe to
- * return here. Legacy ZONE_DEVICE pages that set pte_devmap()
- * do not have refcounts. Example of legacy ZONE_DEVICE is
- * MEMORY_DEVICE_FS_DAX type in pmem or virtio_fs drivers.
- */
- return NULL;
print_bad_pte(vma, addr, pte, NULL);
return NULL;
But, I would have thought the later patches in Alistairs series would sort that out
(where we remove pte_devmap() ... )
--
Cheers,
David / dhildenb
More information about the linux-riscv
mailing list