[PATCH v3 1/3] arm64, vmcoreinfo : Append 'PTRS_PER_PGD' to vmcoreinfo

Bhupesh Sharma bhsharma at redhat.com
Sat May 4 05:53:56 PDT 2019


On 04/03/2019 11:24 PM, Bhupesh Sharma wrote:
> Hi James,
> 
> On 04/02/2019 10:56 PM, James Morse wrote:
>> Hi Bhupesh,
>>
>> On 28/03/2019 11:42, Bhupesh Sharma wrote:
>>> On 03/26/2019 10:06 PM, James Morse wrote:
>>>> On 20/03/2019 05:09, Bhupesh Sharma wrote:
>>>>> With ARMv8.2-LVA architecture extension availability, arm64 hardware
>>>>> which supports this extension can support a virtual address-space upto
>>>>> 52-bits.
>>>>>
>>>>> Since at the moment we enable the support of this extension in kernel
>>>>> via CONFIG flags, e.g.
>>>>>    - User-space 52-bit LVA via CONFIG_ARM64_USER_VA_BITS_52
>>>>>
>>>>> so, there is no clear mechanism in the user-space right now to
>>>>> determine these CONFIG flag values and hence determine the maximum
>>>>> virtual address space supported by the underlying kernel.
>>>>>
>>>>> User-space tools like 'makedumpfile' therefore are broken currently
>>>>> as they have no proper method to calculate the 'PTRS_PER_PGD' value
>>>>> which is required to perform a page table walk to determine the
>>>>> physical address of a corresponding virtual address found in
>>>>> kcore/vmcoreinfo.
>>>>>
>>>>> If one appends 'PTRS_PER_PGD' number to vmcoreinfo for arm64,
>>>>> it can be used in user-space to determine the maximum virtual address
>>>>> supported by underlying kernel.
>>>>
>>>> I don't think this really solves the problem, it feels fragile.
>>>>
>>>> I can see how vmcoreinfo tells you VA_BITS==48, PAGE_SIZE==64K and 
>>>> PTRS_PER_PGD=1024.
>>>> You can use this to work out that the top level page table size 
>>>> isn't consistent with a
>>>> 48bit VA, so 52bit VA must be in use...
>>>>
>>>> But wasn't your problem walking the kernel page tables? In 
>>>> particular the offset that we
>>>> apply because the tables were based on a 48bit VA shifted up in 
>>>> swapper_pg_dir.
>>>>
>>>> Where does the TTBR1_EL1 offset come from with this property? I 
>>>> assume makedumpfile
>>>> hard-codes it when it sees 52bit is in use ... somewhere.
>>>> We haven't solved the problem!
>>
>>> But isn't the TTBR1_EL1 offset already appended by the kernel via 
>>> e842dfb5a2d3 ("arm64:
>>> mm: Offset TTBR1 to allow 52-bit PTRS_PER_PGD")
>>> in case of kernel configuration where 52-bit userspace VAs are possible.
>>
>>> Accordingly we have the following assembler helper in 
>>> 'arch/arm64/include/asm/assembler.h':
>>>
>>>         .macro  offset_ttbr1, ttbr
>>> #ifdef CONFIG_ARM64_52BIT_VA
>>>         orr     \ttbr, \ttbr, #TTBR1_BADDR_4852_OFFSET
>>> #endif
>>>         .endm
>>>
>>> where:
>>> #ifdef CONFIG_ARM64_52BIT_VA
>>> /* Must be at least 64-byte aligned to prevent corruption of the TTBR */
>>> #define TTBR1_BADDR_4852_OFFSET        (((UL(1) << (52 - 
>>> PGDIR_SHIFT)) - \
>>>                                  (UL(1) << (48 - PGDIR_SHIFT))) * 8)
>>> #endif
>>
>> Sure, and all this would work today, because there is only one weird 
>> combination. But once
>> we support another combination of 52bit-va, you'd either need another 
>> value, or to start
>> using PTRS_PER_PGD as a flag for v5.1_FUNNY_BEHAVIOUR_ONE.
> 
> I completed my user-space experimentation with 52-bit kernel VA changes 
> from Steve today and have shared a detailed review on his patchset (See 
> <http://lists.infradead.org/pipermail/kexec/2019-April/022750.html>).
> 
> But first let me share some opinion on how we are adding the 52-bit 
> address space changes for arm64 in the kernel.
> 
> I think we have ended up adding just a bit _too many_ CONFIG and MACRO 
> values for the increased address space changes. For e.g. after the 
> 52-bit kernel VA changes we have at-least 4 macros which explain the VA 
> address range with CONFIG_ARM64_USER_KERNEL_VA_BITS_52=y:
> 
> VA_BITS = 52,
> VA_BITS_ACTUAL = vabits_actual = 48,
> VA_BITS_MIN = min (48, VA_BITS) = 48.
> PTRS_PER_PGD = 64 (48-bit) or 1024 (52-bit)
> 
> Of these, VA_BITS, VA_BITS_ACTUAL and PTRS_PER_PGD are definitely of 
> interest in the userspace as they define:
> 
> 1.
>   /*
>      * VMEMMAP_SIZE - allows the whole linear region to be covered by
>      *                a struct page array
>      */
>     #define VMEMMAP_SIZE (UL(1) << (VA_BITS - PAGE_SHIFT - 1 + 
> STRUCT_PAGE_MAX_SHIFT))
> 
> 2. #define __is_lm_address(addr)    (!((addr) & BIT(VA_BITS_ACTUAL - 1)))
> 
> We have discussed the usage of PTRS_PER_PGD in userspace already at 
> quite some length, so I will focus on the other two below (VA_BITS and 
> VA_BITS_ACUAL).
> 
> Both are critical for determining VMEMMAP_SIZE and whether a virtual 
> address lies in the linear map range respectively.
> 
> I don't see any standard mechanism other than the following to achieve a 
> working user-space with these changes:
> - a sysfs node (may be a 
> '/sys/devices/system/cpu/addressing-capabilities' node?) or HWCAP 
> capability export for user-space utilities which perform a live analysis 
> and use the above variables.
> - exporting these variables in vmcoreinfo (for analysis of crash dump).
> 
> VA_BITS is already exported in vmcoreinfo, whereas I have proposed 
> exporting PTRS_PER_PGD to vmcoreinfo via this patch.
> 
> For 52-bit kernel VA changes, VA_BITS_ACTUAL will also be needed in 
> vmcoreinfo (See 
> <http://lists.infradead.org/pipermail/kexec/2019-April/022750.html> for 
> details).
> 
>>> Note that the above computation holds true both for PTRS_PER_PGD = 64 
>>> (48-bit kernel with
>>> 48-bit User VA) and 1024 (48-bit with 52-bit User VA) cases. And 
>>> these are the
>>> configurations for which we are trying to fix the user-space 
>>> regressions reported (on
>>> arm64) recently.
>>
>> ... and revisit it when there is another combination?
> 
> See above.
> 
>>>> Today __cpu_setup() sets T0SZ and T1SZ differently for 52bit VA, but 
>>>> in the future it
>>>> could set them the same, or different the other-way-round.
>>>>
>>>> Will makedumpfile using this value keep working once T1SZ is 52bit 
>>>> VA too? In this case
>>>> there would be no ttbr offset.
>>>>
>>>> If you need another vmcoreinfo flag once that happens, we've done 
>>>> something wrong here.
>>>
>>> I am currently experimenting with Steve's patches for 52-bit kernel VA
>>> (<https://lwn.net/Articles/780093/>) and will comment more on the 
>>> same when I am able to
>>> get the user-space utilities like makedumpfile and kexec-tools to 
>>> work with the same on
>>> both ARMv8 Fast Simulator model and older CPUs which don't support 
>>> ARMv8.2 extensions.
>>
>>> However, I think we should not hold up fixes for regressions already 
>>> reported, because the
>>> 52-bit kernel VA changes probably still need some more rework.
>>
>> Chucking things into vmcoreinfo isn't free: we need to keep them there 
>> forever, otherwise
>> yesterdays version of the tools breaks. Can we take the time to get 
>> this right for the
>> cases we know about?
> 
> Sure, but exporting variable(s) in vmcoreinfo in directly related to the 
> information variable(s) we add in the kernel side without which the 
> user-space would break.
> 
> I have added the requirements for 52-bit kernel VA above (i.e we need an 
> additional VA_BITS_ACTUAL variable export'ed rather than any tinkering 
> with already proposed PTRS_PER_PGD).
> 
> May be this is a good time to also talk about minimizing the kernel 
> interfaces we are proposing to hold and indicate normal (48-bit) and 
> extended (52-bit) address spaces on arm64.
> 
> Ideally, we would want to simplify it further to be on similar lines as 
> x86:
> CONFIG_X86_5LEVEL=y
> vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
>              pgtable_l5_enabled());
> 
> which seems much cleaner..
> 
> I am open to any suggestions on the same.
> 
>> Yes the kernel code is going to move around, this is why the 
>> information we expose via
>> vmcoreinfo needs to be thought through: something we would always 
>> need, regardless of how
>> the kernel implements it.
>>
>>
>>>> (Not to mention what happens if the TTBR1_EL1 uses 52bit va, but 
>>>> TTBR0_EL1 doesn't)
>>>
>>> I am wondering if there are any real users of the above combination.
>>
>> Heh! Is there any hardware that supports this?
>>
>> Pointer-auth changes all this again, as we may prefer to use the bits 
>> for pointer-auth in
>> one TTB or the other. PTRS_PER_PGD may show the 52bit value in this 
>> case, but neither TTBR
>> is mapping 52bits of VA.
>>
>>
>>> So far, I have generally come across discussions where the following 
>>> variations of the
>>> address spaces have been proposed/requested:
>>> - 48bit kernel VA + 48-bit User VA,
>>> - 48-bit kernel VA + 52-bit User VA,
>>
>> + 52bit kernel, because there is excessive quantities of memory, and 
>> the kernel maps it
>> all, but 48-bit user, because it never maps all the memory, and we 
>> prefer the bits for
>> pointer-auth.
>>
>>> - 52-bit kernel VA + 52-bit User VA.
>>
>> And...  all four may happen with the same built image. I don't see how 
>> you can tell these
>> cases apart with the one (build-time-constant!) PTRS_PER_PGD value.
>>
>> I'm sure some of these cases are hypothetical, but by considering it 
>> all now, we can avoid
>> three more kernel:vmcoreinfo updates, and three more 
>> fix-user-space-to-use-the-new-value.
> 
> Agree.
> 
>> I think you probably do need PTRS_PER_PGD, as this is the one value 
>> the mm is using to
>> generate page tables. I'm pretty sure you also need T0SZ and T1SZ to 
>> know if that's
>> actually in use, or the kernel is bodging round it with an offset.
> 
> Sure, I am open to suggestions (as I realize that we need an additional 
> VA_BITS_ACTUAL variable export'ed for 52-bit kernel VA changes).
> 
> Also how do we standardize reading T0SZ and T1SZ in user-space. Do you 
> propose I make an enhancement in the cpu-feature-registers interface 
> (see [1]) or the HWCAPS interface (see [2]) for the same?
> 
> [1]. 
> https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt
> [2]. https://www.kernel.org/doc/Documentation/arm64/elf_hwcaps.txt
> 
> Thanks,
> Bhupesh

Ping.

Hi James, Steve,

Any comments on the above points? At the moment we have to carry these 
fixes in the distribution kernels and I would like to have these fixed 
in upstream kernel itself.

Thanks,
Bhupesh



More information about the kexec mailing list