[PATCH v8 36/43] arm64: mm: Add support for folding PUDs at runtime

Ryan Roberts ryan.roberts at arm.com
Fri Mar 1 02:22:47 PST 2024


On 01/03/2024 09:47, Ryan Roberts wrote:
> On 01/03/2024 09:37, Ard Biesheuvel wrote:
>> On Fri, 1 Mar 2024 at 10:10, Ard Biesheuvel <ardb at kernel.org> wrote:
>>>
>>> On Fri, 1 Mar 2024 at 09:54, Ryan Roberts <ryan.roberts at arm.com> wrote:
>>>>
>>>> + Matthew
>>>>
>>>>
>>>> On 29/02/2024 23:01, Nathan Chancellor wrote:
>>>>> On Thu, Feb 29, 2024 at 02:17:52PM +0000, Ryan Roberts wrote:
>>>>>> Hi Ard,
>>>>>>
>>>>>> On 14/02/2024 12:29, Ard Biesheuvel wrote:
>>>>>>> From: Ard Biesheuvel <ardb at kernel.org>
>>>>>>>
>>>>>>> In order to support LPA2 on 16k pages in a way that permits non-LPA2
>>>>>>> systems to run the same kernel image, we have to be able to fall back to
>>>>>>> at most 48 bits of virtual addressing.
>>>>>>>
>>>>>>> Falling back to 48 bits would result in a level 0 with only 2 entries,
>>>>>>> which is suboptimal in terms of TLB utilization. So instead, let's fall
>>>>>>> back to 47 bits in that case. This means we need to be able to fold PUDs
>>>>>>> dynamically, similar to how we fold P4Ds for 48 bit virtual addressing
>>>>>>> on LPA2 with 4k pages.
>>>>>>
>>>>>> I'm seeing a panic during boot in today's linux-next (20240229) and bisect seems pretty confident that this commit is the offender. That said, its the merge commit that shows up as the problem commit:
>>>>>>
>>>>>> 26843fe8fa72 Merge branch 'for-next/core' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
>>>>>>
>>>>>> but when testing the arm64's for-next/core, the problem doesn't exist. So I rebased the branch into linux-next and bisected again. That time, it fingers this patch. So I guess there is some interaction between this and other changes in next?
>>>>> <...>
>>>>>> [    0.161062] debug_vm_pgtable: [debug_vm_pgtable         ]: Validating architecture page table helpers
>>>>>> [    0.161416] BUG: Bad page state in process swapper/0  pfn:18a65d
>>>>>> [    0.161634] page does not match folio
>>>>>> [    0.161753] page: refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x18a65d
>>>>>> [    0.162046] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
>>>>>> [    0.162332] Mem abort info:
>>>>>> [    0.162427]   ESR = 0x0000000096000004
>>>>>> [    0.162559]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>> [    0.162723]   SET = 0, FnV = 0
>>>>>> [    0.162827]   EA = 0, S1PTW = 0
>>>>>> [    0.162933]   FSC = 0x04: level 0 translation fault
>>>>>> [    0.163089] Data abort info:
>>>>>> [    0.163189]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>>>>>> [    0.163370]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>>>>>> [    0.163539]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>>>>>> [    0.163719] [0000000000000008] user address but active_mm is swapper
>>>>>> [    0.163934] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
>>>>>> [    0.164143] Modules linked in:
>>>>>> [    0.164251] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.8.0-rc6-00966-gde701dc1f7f8 #25
>>>>>> [    0.164516] Hardware name: linux,dummy-virt (DT)
>>>>>> [    0.164704] pstate: 21400005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
>>>>>> [    0.165052] pc : get_pfnblock_flags_mask+0x3c/0x68
>>>>>> [    0.165281] lr : __dump_page+0x1a0/0x408
>>>>>> [    0.165504] sp : ffff80008007b8f0
>>>>>> [    0.165715] x29: ffff80008007b8f0 x28: 0000000000ffffc0 x27: 0000000000000000
>>>>>> [    0.166047] x26: ffff80008007b950 x25: 0000000000000000 x24: 00000000fffffdff
>>>>>> [    0.166358] x23: ffffba8a417ba000 x22: 000000000018a65d x21: ffffba8a41601bf8
>>>>>> [    0.166701] x20: ffff80008007b950 x19: ffff80008007b950 x18: 0000000000000006
>>>>>> [    0.167036] x17: 78303a7865646e69 x16: 2030303030303030 x15: 0720072007200720
>>>>>> [    0.167365] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
>>>>>> [    0.167693] x11: 0720072007200720 x10: ffffba8a4269c038 x9 : ffffba8a3fb0d0b8
>>>>>> [    0.168017] x8 : 00000000ffffefff x7 : ffffba8a4269c038 x6 : 80000000fffff000
>>>>>> [    0.168346] x5 : 000003fffff81de4 x4 : 0001fffffc0ef230 x3 : 0000000000000000
>>>>>> [    0.168699] x2 : 0000000000000007 x1 : fffffe0779181ee5 x0 : 00000000001fffff
>>>>>> [    0.169041] Call trace:
>>>>>> [    0.169164]  get_pfnblock_flags_mask+0x3c/0x68
>>>>>> [    0.169413]  dump_page+0x2c/0x70
>>>>>> [    0.169565]  bad_page+0x84/0x130
>>>>>> [    0.169734]  free_page_is_bad_report+0xa0/0xb8
>>>>>> [    0.169958]  free_unref_page_prepare+0x350/0x428
>>>>>> [    0.170132]  free_unref_page+0x50/0x1f0
>>>>>> [    0.170278]  __free_pages+0x11c/0x160
>>>>>> [    0.170417]  free_pages.part.0+0x6c/0x88
>>>>>> [    0.170576]  free_pages+0x1c/0x38
>>>>>> [    0.170703]  destroy_args+0x1c8/0x330
>>>>>> [    0.170890]  debug_vm_pgtable+0xae8/0x10f8
>>>>>> [    0.171059]  do_one_initcall+0x60/0x2c0
>>>>>> [    0.171222]  kernel_init_freeable+0x1ec/0x3d8
>>>>>> [    0.171406]  kernel_init+0x28/0x1f0
>>>>>> [    0.171557]  ret_from_fork+0x10/0x20
>>>>>> [    0.171712] Code: d37b1884 f100007f 8b040064 9a831083 (f9400460)
>>>>>> [    0.171963] ---[ end trace 0000000000000000 ]---
>>>>>> [    0.172156] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>>>>>> [    0.172383] SMP: stopping secondary CPUs
>>>>>> [    0.172649] Kernel Offset: 0x3a89bf800000 from 0xffff800080000000
>>>>>> [    0.173923] PHYS_OFFSET: 0xfffff76180000000
>>>>>> [    0.174585] CPU features: 0x0,00000000,2004454a,13867723
>>>>>> [    0.175707] Memory Limit: none
>>>>>> [    0.176261] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
>>>>>
>>>>> I did a second bisection by merging https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/stage1-lpa2
>>>>> on top of the merges before for-next/core and eventually landed on:
>>>>>
>>>>> d67cd9f23139ddfd7e0ef1e18474c16445188433 is the first bad commit
>>>>> commit d67cd9f23139ddfd7e0ef1e18474c16445188433
>>>>> Author: Matthew Wilcox (Oracle) <willy at infradead.org>
>>>>> Date:   Tue Feb 27 19:23:31 2024 +0000
>>>>>
>>>>>     mm: add __dump_folio()
>>>>>
>>>>>     Turn __dump_page() into a wrapper around __dump_folio().  Snapshot the
>>>>>     page & folio into a stack variable so we don't hit BUG_ON() if an
>>>>>     allocation is freed under us and what was a folio pointer becomes a
>>>>>     pointer to a tail page.
>>>>>
>>>>>     Link: https://lkml.kernel.org/r/20240227192337.757313-5-willy@infradead.org
>>>>>     Signed-off-by: Matthew Wilcox (Oracle) <willy at infradead.org>
>>>>>     Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
>>>>
>>>> So is that suggesting that Ard's patch is doing something that the old
>>>> __dump_page() was ok with but the new version doesn't like? I don't think so,
>>>> because the bad page detection has already happened before we get to __dump_page().
>>>>
>>>
>>> Yes, there are clearly two different issues at play here. The NULL
>>> dereference might be an issue in the __dump_page() patch, but going
>>> down that code path in the first place seems like it might be a
>>> problem with mine.
>>>
>>> The mapcount of -512 looks interesting as well.
>>>
>>>> So I'm not really sure how this patch is involved? I'm hoping that either Ard or
>>>> Matthew may be able to take a look and advise.
>>>>
>>>
>>
>> The crash does not reproduce for me, but the warning can be fixed by
>>
>> diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
>> index aeba2cf15a25..78f30b782889 100644
>> --- a/arch/arm64/include/asm/pgalloc.h
>> +++ b/arch/arm64/include/asm/pgalloc.h
>> @@ -61,7 +61,7 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
>>         if (!pgtable_l4_enabled())
>>                 return;
>>         BUG_ON((unsigned long)pud & (PAGE_SIZE-1));
>> -       free_page((unsigned long)pud);
>> +       __pud_free(mm, pud);
>>  }
>>  #else
>>  static inline void __p4d_populate(p4d_t *p4dp, phys_addr_t pudp, p4dval_t prot)
>>
>> I'll send this out as a patch shortly.
> 
> Great thanks! I'm pretty sure I've found the bug in Matthew's patch - it is
> copying the struct page to the stack to avoid a potential race, but later some
> macros are hiding a page_to_pfn(). Since the page's address is on the stack, I
> reckon that's giving a bogus pfn. Just confirming and writing it up and will
> send against the original patch shortly.
> 

OK confirmed. When I fix Matthew's patch, the panic gets converted to a warning,
and when I add the fix for your patch, there is no warning at all.

See write up for Matthew's bug here:
https://lore.kernel.org/linux-mm/6de0d026-cd8d-4152-97ca-d33d2a4e2e84@arm.com/

Thanks,
Ryan






More information about the linux-arm-kernel mailing list