[PATCH v1 3/5] mm: ptdump: Provide page size to notepage()

Christophe Leroy christophe.leroy at csgroup.eu
Fri Apr 16 15:40:52 BST 2021



Le 16/04/2021 à 15:00, Steven Price a écrit :
> On 16/04/2021 12:08, Christophe Leroy wrote:
>>
>>
>> Le 16/04/2021 à 12:51, Steven Price a écrit :
>>> On 16/04/2021 11:38, Christophe Leroy wrote:
>>>>
>>>>
>>>> Le 16/04/2021 à 11:28, Steven Price a écrit :
>>>>> On 15/04/2021 18:18, Christophe Leroy wrote:
>>>>>> In order to support large pages on powerpc, notepage()
>>>>>> needs to know the page size of the page.
>>>>>>
>>>>>> Add a page_size argument to notepage().
>>>>>>
>>>>>> Signed-off-by: Christophe Leroy <christophe.leroy at csgroup.eu>
>>>>>> ---
>>>>>>   arch/arm64/mm/ptdump.c         |  2 +-
>>>>>>   arch/riscv/mm/ptdump.c         |  2 +-
>>>>>>   arch/s390/mm/dump_pagetables.c |  3 ++-
>>>>>>   arch/x86/mm/dump_pagetables.c  |  2 +-
>>>>>>   include/linux/ptdump.h         |  2 +-
>>>>>>   mm/ptdump.c                    | 16 ++++++++--------
>>>>>>   6 files changed, 14 insertions(+), 13 deletions(-)
>>>>>>
>>>>> [...]
>>>>>> diff --git a/mm/ptdump.c b/mm/ptdump.c
>>>>>> index da751448d0e4..61cd16afb1c8 100644
>>>>>> --- a/mm/ptdump.c
>>>>>> +++ b/mm/ptdump.c
>>>>>> @@ -17,7 +17,7 @@ static inline int note_kasan_page_table(struct mm_walk *walk,
>>>>>>   {
>>>>>>       struct ptdump_state *st = walk->private;
>>>>>> -    st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]));
>>>>>> +    st->note_page(st, addr, 4, pte_val(kasan_early_shadow_pte[0]), PAGE_SIZE);
>>>>>
>>>>> I'm not completely sure what the page_size is going to be used for, but note that KASAN 
>>>>> presents an interesting case here. We short-cut by detecting it's a KASAN region at a high 
>>>>> level (PGD/P4D/PUD/PMD) and instead of walking the tree down just call note_page() *once* but 
>>>>> with level==4 because we know KASAN sets up the page table like that.
>>>>>
>>>>> However the one call actually covers a much larger region - so while PAGE_SIZE matches the 
>>>>> level it doesn't match the region covered. AFAICT this will lead to odd results if you enable 
>>>>> KASAN on powerpc.
>>>>
>>>> Hum .... I successfully tested it with KASAN, I now realise that I tested it with 
>>>> CONFIG_KASAN_VMALLOC selected. In this situation, since 
>>>> https://github.com/torvalds/linux/commit/af3d0a686 we don't have any common shadow page table 
>>>> anymore.
>>>>
>>>> I'll test again without CONFIG_KASAN_VMALLOC.
>>>>
>>>>>
>>>>> To be honest I don't fully understand why powerpc requires the page_size - it appears to be 
>>>>> using it purely to find "holes" in the calls to note_page(), but I haven't worked out why such 
>>>>> holes would occur.
>>>>
>>>> I was indeed introduced for KASAN. We have a first commit 
>>>> https://github.com/torvalds/linux/commit/cabe8138 which uses page size to detect whether it is a 
>>>> KASAN like stuff.
>>>>
>>>> Then came https://github.com/torvalds/linux/commit/b00ff6d8c as a fix. I can't remember what the 
>>>> problem was exactly, something around the use of hugepages for kernel memory, came as part of 
>>>> the series 
>>>> https://patchwork.ozlabs.org/project/linuxppc-dev/cover/cover.1589866984.git.christophe.leroy@csgroup.eu/ 
>>>
>>>
>>>
>>>
>>> Ah, that's useful context. So it looks like powerpc took a different route to reducing the KASAN 
>>> output to x86.
>>>
>>> Given the generic ptdump code has handling for KASAN already it should be possible to drop that 
>>> from the powerpc arch code, which I think means we don't actually need to provide page size to 
>>> notepage(). Hopefully that means more code to delete ;)
>>>
>>
>> Yes ... and no.
>>
>> It looks like the generic ptdump handles the case when several pgdir entries points to the same 
>> kasan_early_shadow_pte. But it doesn't take into account the powerpc case where we have regular 
>> page tables where several (if not all) PTEs are pointing to the kasan_early_shadow_page .
> 
> I'm not sure I follow quite how powerpc is different here. But could you have a similar check for 
> PTEs against kasan_early_shadow_pte as the other levels already have?
> 
> I'm just worried that page_size isn't well defined in this interface and it's going to cause 
> problems in the future.
> 

I'm trying. I reverted the two commits b00ff6d8c and cabe8138.

At the moment, I don't get exactly what I expect: For linear memory I get one line for each 8M page 
whereas before reverting the patches I got one 16M line and one 112M line.

And for KASAN shadow area I get two lines for the 2x 8M pages shadowing linear mem then I get one 4M 
line for each PGDIR entry pointing to kasan_early_shadow_pte.

0xf8000000-0xf87fffff 0x07000000         8M   huge        rw       present
0xf8800000-0xf8ffffff 0x07800000         8M   huge        rw       present
0xf9000000-0xf93fffff 0x01430000         4M               r        present
0xf9400000-0xf97fffff 0x01430000         4M               r        present
0xf9800000-0xf9bfffff 0x01430000         4M               r        present
0xf9c00000-0xf9ffffff 0x01430000         4M               r        present
0xfa000000-0xfa3fffff 0x01430000         4M               r        present
0xfa400000-0xfa7fffff 0x01430000         4M               r        present
0xfa800000-0xfabfffff 0x01430000         4M               r        present
0xfac00000-0xfaffffff 0x01430000         4M               r        present
0xfb000000-0xfb3fffff 0x01430000         4M               r        present
0xfb400000-0xfb7fffff 0x01430000         4M               r        present
0xfb800000-0xfbbfffff 0x01430000         4M               r        present
0xfbc00000-0xfbffffff 0x01430000         4M               r        present
0xfc000000-0xfc3fffff 0x01430000         4M               r        present
0xfc400000-0xfc7fffff 0x01430000         4M               r        present
0xfc800000-0xfcbfffff 0x01430000         4M               r        present
0xfcc00000-0xfcffffff 0x01430000         4M               r        present
0xfd000000-0xfd3fffff 0x01430000         4M               r        present
0xfd400000-0xfd7fffff 0x01430000         4M               r        present
0xfd800000-0xfdbfffff 0x01430000         4M               r        present
0xfdc00000-0xfdffffff 0x01430000         4M               r        present
0xfe000000-0xfe3fffff 0x01430000         4M               r        present
0xfe400000-0xfe7fffff 0x01430000         4M               r        present
0xfe800000-0xfebfffff 0x01430000         4M               r        present
0xfec00000-0xfeffffff 0x01430000         4M               r        present

Any idea ?

Christophe



More information about the linux-arm-kernel mailing list