[PATCH] arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo

Bhupesh Sharma bhsharma at redhat.com
Thu Feb 14 11:30:09 PST 2019


Hi James,

On 02/13/2019 11:52 PM, James Morse wrote:
> Hi guys,
> 
> On 13/02/2019 11:15, Dave Young wrote:
>> On 02/12/19 at 11:03pm, Kazuhito Hagio wrote:
>>> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote:
>>>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA
>>>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu
>>>> mentioned that the changes look necessary.
>>>>
>>>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html
>>>
>>>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then
>>>>>> calculated as is done by the underlying kernel
> 
> Aha! Nothing to do with which-bits-are-pfn in the tables...
> 
> You need to know if the top level PGD is 512bytes or bigger. As we use a
> kmem-cache the adjacent data could be some else's page tables.
> 
> Is this really a problem though? You can't pull the user-space pgd pointers out
> of no-where, you must have walked some task_struct and struct_mm's to find them.
> In which case you would have the VMAs on hand to tell you if its in the mapped
> user range.
> 
> It would be good to avoid putting something arch-specific in here if we can at
> all help it.
> 
> 
>>>>>> (see
>>>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details):
>>>>>>
>>>>>> #define PTRS_PER_PGD          (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT))
>>>
>>> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS.
>>> It is used for pgd_index() also in makedumpfile to walk page tables.
>>>
>>> /* to find an entry in a page-table-directory */
>>> #define pgd_index(addr)         (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
>>
>> Since Dave mentioned crash tool does not need it, but crash should also
>> travel the pg tables.
>>
>> If this is really necessary it would be good to describe what will
>> happen without the patch, eg. some user visible error from an actual test etc.
> 
> Yes please, it would really help if there was a specific example we could discuss.

Sure. Here are two use-cases/regressions reported and which I have been 
able to reproduce. Note that I tested them both on a CPU which does not 
support ARMv8.2-LPA/LVA and on ARMv8 FVP model (which supports ARMv8.2 
extensions).

Environment:
------------
Latest Upstream kernel: sha-id: 1f947a7a011fcceb14cb912f5481a53b18f1879a 
("Merge branch 'akpm' (patches from Andrew)")

Latest makedumpfile code: (git://git.code.sf.net/p/makedumpfile/code , 
branch: devel)

crash-utility code: (https://github.com/crash-utility/crash.git, sha-id: 
e082c372c7f1a782b058ec359dfbbbee0f0b6aad)

Note that Dave A. has since fixed crash-utility by using a hardcoded 
value of 'MAX_PHYSMEM_BITS' (via sha id: 
ac5a7889d31bb37aa0687110ecea08837f8a66a8) and determining 'vabits_user' 
value via vmlinux (via sha id: 8618ddd817621c40c1f44f0ab6df7c7805234416)

(1). Regression Case 1 (ARMv8.2-LPA enabled kernel):

- Upstream makedumpfile and crash-utility (with sha-id 
e082c372c7f1a782b058ec359dfbbbee0f0b6aad) are broken on following kind 
of platforms:

a. Upstream Kernel 5.0.0-rc6+ with the following kernel configuration:

CONFIG_ARM64_64K_PAGES=y
# CONFIG_ARM64_VA_BITS_42 is not set
CONFIG_ARM64_VA_BITS_48=y
# CONFIG_ARM64_USER_VA_BITS_52 is not set
CONFIG_ARM64_VA_BITS=48
# CONFIG_ARM64_PA_BITS_48 is not set
CONFIG_ARM64_PA_BITS_52=y
CONFIG_ARM64_PA_BITS=52

b. Both on CPUs which don't support ARMv8.2 LPA extension and on ARMv8 
FVP model with ARMv8.2 LPA extensions.

- Error message from makedumpfile:

$ makedumpfile -f --mem-usage /proc/kcore -D
max_mapnr    : a00000
kimage_voffset   : fffeffff80000000
max_physmem_bits : 30
section_size_bits: 1e
vaddr_to_paddr_arm64: pgda=90d70000, pudv.pgd=ffffff80ffffffd0
vaddr_to_paddr_arm64: puda=90d70000, pudv.pgd=9fffff0003
vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0003
vaddr_to_paddr_arm64: paddr=911a962c
va_bits      : 48
page_offset  : ffff800000000000
vaddr_to_paddr_arm64: pgda=90d70000, pudv.pgd=41a474
vaddr_to_paddr_arm64: puda=90d70000, pudv.pgd=9fffff0003
vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0003
vaddr_to_paddr_arm64: paddr=911a1320
num of NODEs : 1
Memory type  : SPARSEMEM

vaddr_to_paddr_arm64: pgda=90d70100, pudv.pgd=a657470206461
vaddr_to_paddr_arm64: puda=90d70100, pudv.pgd=9ffffd0003
vaddr_to_paddr_arm64: pmda=9ffffd27d8, pmdv.pud=9ffeee0003
vaddr_to_paddr_arm64: paddr=9ffeedc600
vaddr_to_paddr_arm64: pgda=90d70100, pudv.pgd=ffff97d98224
vaddr_to_paddr_arm64: puda=90d70100, pudv.pgd=9ffffd0003
vaddr_to_paddr_arm64: pmda=9ffffd27d8, pmdv.pud=9ffeee0003
vaddr_to_paddr_arm64: paddr=9ffeedc600
get_mm_sparsemem: Can't get the address of mem_section.

c. Root Cause Analysis -

- After the PA_BITS changes in arm64 kernel we set:
#define MAX_PHYSMEM_BITS	CONFIG_ARM64_PA_BITS

- For SPARSEMEM, this value is used to calculate the bits space required 
to store a section:
#define SECTIONS_SHIFT	(MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
#define NR_MEM_SECTIONS		(1UL << SECTIONS_SHIFT)

- User-space tools use a similar mechanism to determine the SPARSEMEM 
type (extreme or not) using the 'NR_MEM_SECTIONS' value (an example from 
makedumpfile code):

int
is_sparsemem_extreme(void)
{
	if ((ARRAY_LENGTH(mem_section)
	     == divideup(NR_MEM_SECTIONS(), _SECTIONS_PER_ROOT_EXTREME()))
	    || (ARRAY_LENGTH(mem_section) == NOT_FOUND_STRUCTURE))
		return TRUE;
	else
		return FALSE;
}

- Since MAX_PHYSMEM_BITS are 48 bits for normal cases and are 52 bits 
for extended PA address space, the memory type is incorrectly calculated 
as SPARSEMEM rather than SPARSEMEM_EX in above case.

- Exporting correct 'MAX_PHYSMEM_BITS' via vmcoreinfo for 52-bit PA 
case, fixes the above mentioned issue:

$ makedumpfile -f --mem-usage /proc/kcore -D

<..snip..>
  NUMBER(MAX_PHYSMEM_BITS)=52

<..snip..>
max_mapnr    : a00000
kimage_voffset   : fffeffff80000000
max_physmem_bits : 30
section_size_bits: 1e
vaddr_to_paddr_arm64: pgda=90d70000, pudv.pgd=ffffff80ffffffd0
vaddr_to_paddr_arm64: puda=90d70000, pudv.pgd=9fffff0003
vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0003
vaddr_to_paddr_arm64: paddr=911a962c
va_bits      : 48
page_offset  : ffff800000000000
vaddr_to_paddr_arm64: pgda=90d70000, pudv.pgd=41a474
vaddr_to_paddr_arm64: puda=90d70000, pudv.pgd=9fffff0003
vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0003
vaddr_to_paddr_arm64: paddr=911a1320
num of NODEs : 1
Memory type  : SPARSEMEM_EX
<..snip..>

TYPE		PAGES			EXCLUDABLE	DESCRIPTION
----------------------------------------------------------------------
ZERO		2626            	yes		Pages filled with zero
NON_PRI_CACHE	569             	yes		Cache pages without private flag
PRI_CACHE	5446            	yes		Cache pages with private flag
USER		3213            	yes		User process pages
FREE		2048971         	yes		Free pages
KERN_DATA	19034           	no		Dumpable kernel data

page size:		65536
Total pages on system:	2079859
Total size on system:	136305639424     Byte

(2). Regression Case 2 (ARMv8.2-LPA + LVA [52-bit user-space VA] enabled 
kernel):

- Upstream makedumpfile and crash-utility (with sha-id 
e082c372c7f1a782b058ec359dfbbbee0f0b6aad) are broken on following kind 
of platforms:

a. Upstream Kernel 5.0.0-rc6+ with the following kernel configuration:

CONFIG_ARM64_64K_PAGES=y
# CONFIG_ARM64_VA_BITS_42 is not set
# CONFIG_ARM64_VA_BITS_48 is not set
CONFIG_ARM64_USER_VA_BITS_52=y
CONFIG_ARM64_VA_BITS=48
# CONFIG_ARM64_PA_BITS_48 is not set
CONFIG_ARM64_PA_BITS_52=y
CONFIG_ARM64_PA_BITS=52

b. Both on CPUs which don't support ARMv8.2 extensions and on ARMv8 FVP 
model with  ARMv8.2 extensions.

- Error message from makedumpfile:

$ makedumpfile -f --mem-usage /proc/kcore -D
max_mapnr    : a00000
kimage_voffset   : fffeffff78000000
max_physmem_bits : 30
section_size_bits: 1e
vaddr_to_paddr_arm64: pgda=90f30000, pudv.pgd=ffffff80ffffffd0
vaddr_to_paddr_arm64: puda=90f30000, pudv.pgd=0
readpage_elf: Attempt to read non-existent page at 0x0.
readmem: type_addr: 1, addr:0, size:8
vaddr_to_paddr_arm64: Can't read pmd
readmem: Can't convert a virtual address(ffff0000093c576c) to physical 
address.
readmem: type_addr: 0, addr:ffff0000093c576c, size:390
check_release: Can't get the address of system_utsname.

c. Root Cause Analysis -

- After the 52-bit user-space VA_BIT changes in arm64 kernel we set:
#define PTRS_PER_PGD          (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT))

- User-space tools like makedumpfile and crash use the 'PTRS_PER_PGD' 
value to calculate the 'pgd_index()' of a vaddr:

#define pgd_index(vaddr) 		(((vaddr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))

- Since the kernel now defines 'MAX_USER_VA_BITS' as:
#ifdef CONFIG_ARM64_USER_VA_BITS_52
#define MAX_USER_VA_BITS	52
#else
#define MAX_USER_VA_BITS	VA_BITS
#endif

so, the user-space also needs this value to calculate the 'PTRS_PER_PGD' 
and hence 'pgd_index()' correctly.

- Exporting correct 'MAX_USER_VA_BITS' via vmcoreinfo for the above 
case, fixes the above mentioned issue:

$ makedumpfile -f --mem-usage /proc/kcore -D

<..snip..>
max_mapnr    : a00000
pa_bits    : 52
va_bits    : 48 (vmcoreinfo)
max_user_va_bits : 52 (vmcoreinfo)
kimage_voffset   : fffeffff78000000
max_physmem_bits : 52
section_size_bits: 30
vaddr_to_paddr_arm64: pgda=90f31e00, pudv.pgd=ffffff80ffffffd0
vaddr_to_paddr_arm64: puda=90f31e00, pudv.pgd=9fffff0803
vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0803
vaddr_to_paddr_arm64: paddr=913c576c
page_offset  : ffff800000000000
vaddr_to_paddr_arm64: pgda=90f31e00, pudv.pgd=16e28e8bed294900
vaddr_to_paddr_arm64: puda=90f31e00, pudv.pgd=9fffff0803
vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0803
vaddr_to_paddr_arm64: paddr=913bd2f8
num of NODEs : 1
Memory type  : SPARSEMEM_EX
<..snip..>

Other important notes
---------------------

1. I have quoted only one makedumpfile use-case failure above (i.e. 
calculating --mem-usage on the primary kernel). Other use-cases like 
creating a dumpfile using /proc/vmcore or post-processing a vmcore are 
also broken similarly and get fixed when a kernel which exports 
'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo is used along 
with a modified user-space which can read this information from the 
vmcoreinfo.

2. I was also going through some of the suggestions on earlier threads 
about the PTE calculations for the 52-bit LPA case and discussed them 
with some partner arm64 SoC enggs.

The suggestions to convert a page table entry to a physical address 
without awareness of 52-bit (with an assumption of 64k page size) can be 
risky.

With 64k page and older non-52-bit kernels, while it looks like in the 
current checks that bits [15:12] are zero, and we can move the zeros to 
bits [51:48] (because the zeros don't affect the overall PA) to generate 
the overall 52-bit PA. However, this can cause IMPLEMENTATION SPECIFIC 
issues on different platforms while generating a PA and IPA.

Lets see what the ARMv8 architecture reference manual says about the 
Bits [15:12] for a 64KB page size:

"Bits [15:12] of each valid translation table descriptor hold Bits 
[51:48] of the output address, or of the address
of the translation table to be used for the initial lookup at the next 
level of translation. If the implementation
does not support 52-bit physical addresses, then it is IMPLEMENTATION 
DEFINED whether non-zero values for
these bits generate an Address size fault. In this case, not generating 
an Address Size Fault is deprecated."

As per the vendors, we should not assume that hardware (which does not 
support 52-bit physical addresses) would generate an Address size fault 
for non-zero values of Bits[15:12], so extending them to bits [51:48] 
always can lead to PA address which might cause UNDEFINED behavior on 
some SoCs.

Hope the above text clarifies the problem and what I am trying to fix 
via this patch. Please let me know if something is missing here.

Thanks,
Bhupesh








More information about the kexec mailing list