[PATCH v3 1/3] arm64, vmcoreinfo : Append 'PTRS_PER_PGD' to vmcoreinfo
bhsharma at redhat.com
Wed Apr 3 10:54:12 PDT 2019
On 04/02/2019 10:56 PM, James Morse wrote:
> Hi Bhupesh,
> On 28/03/2019 11:42, Bhupesh Sharma wrote:
>> On 03/26/2019 10:06 PM, James Morse wrote:
>>> On 20/03/2019 05:09, Bhupesh Sharma wrote:
>>>> With ARMv8.2-LVA architecture extension availability, arm64 hardware
>>>> which supports this extension can support a virtual address-space upto
>>>> Since at the moment we enable the support of this extension in kernel
>>>> via CONFIG flags, e.g.
>>>> - User-space 52-bit LVA via CONFIG_ARM64_USER_VA_BITS_52
>>>> so, there is no clear mechanism in the user-space right now to
>>>> determine these CONFIG flag values and hence determine the maximum
>>>> virtual address space supported by the underlying kernel.
>>>> User-space tools like 'makedumpfile' therefore are broken currently
>>>> as they have no proper method to calculate the 'PTRS_PER_PGD' value
>>>> which is required to perform a page table walk to determine the
>>>> physical address of a corresponding virtual address found in
>>>> If one appends 'PTRS_PER_PGD' number to vmcoreinfo for arm64,
>>>> it can be used in user-space to determine the maximum virtual address
>>>> supported by underlying kernel.
>>> I don't think this really solves the problem, it feels fragile.
>>> I can see how vmcoreinfo tells you VA_BITS==48, PAGE_SIZE==64K and PTRS_PER_PGD=1024.
>>> You can use this to work out that the top level page table size isn't consistent with a
>>> 48bit VA, so 52bit VA must be in use...
>>> But wasn't your problem walking the kernel page tables? In particular the offset that we
>>> apply because the tables were based on a 48bit VA shifted up in swapper_pg_dir.
>>> Where does the TTBR1_EL1 offset come from with this property? I assume makedumpfile
>>> hard-codes it when it sees 52bit is in use ... somewhere.
>>> We haven't solved the problem!
>> But isn't the TTBR1_EL1 offset already appended by the kernel via e842dfb5a2d3 ("arm64:
>> mm: Offset TTBR1 to allow 52-bit PTRS_PER_PGD")
>> in case of kernel configuration where 52-bit userspace VAs are possible.
>> Accordingly we have the following assembler helper in 'arch/arm64/include/asm/assembler.h':
>> .macro offset_ttbr1, ttbr
>> #ifdef CONFIG_ARM64_52BIT_VA
>> orr \ttbr, \ttbr, #TTBR1_BADDR_4852_OFFSET
>> #ifdef CONFIG_ARM64_52BIT_VA
>> /* Must be at least 64-byte aligned to prevent corruption of the TTBR */
>> #define TTBR1_BADDR_4852_OFFSET (((UL(1) << (52 - PGDIR_SHIFT)) - \
>> (UL(1) << (48 - PGDIR_SHIFT))) * 8)
> Sure, and all this would work today, because there is only one weird combination. But once
> we support another combination of 52bit-va, you'd either need another value, or to start
> using PTRS_PER_PGD as a flag for v5.1_FUNNY_BEHAVIOUR_ONE.
I completed my user-space experimentation with 52-bit kernel VA changes
from Steve today and have shared a detailed review on his patchset (See
But first let me share some opinion on how we are adding the 52-bit
address space changes for arm64 in the kernel.
I think we have ended up adding just a bit _too many_ CONFIG and MACRO
values for the increased address space changes. For e.g. after the
52-bit kernel VA changes we have at-least 4 macros which explain the VA
address range with CONFIG_ARM64_USER_KERNEL_VA_BITS_52=y:
VA_BITS = 52,
VA_BITS_ACTUAL = vabits_actual = 48,
VA_BITS_MIN = min (48, VA_BITS) = 48.
PTRS_PER_PGD = 64 (48-bit) or 1024 (52-bit)
Of these, VA_BITS, VA_BITS_ACTUAL and PTRS_PER_PGD are definitely of
interest in the userspace as they define:
* VMEMMAP_SIZE - allows the whole linear region to be covered by
* a struct page array
#define VMEMMAP_SIZE (UL(1) << (VA_BITS - PAGE_SHIFT - 1 +
2. #define __is_lm_address(addr) (!((addr) & BIT(VA_BITS_ACTUAL - 1)))
We have discussed the usage of PTRS_PER_PGD in userspace already at
quite some length, so I will focus on the other two below (VA_BITS and
Both are critical for determining VMEMMAP_SIZE and whether a virtual
address lies in the linear map range respectively.
I don't see any standard mechanism other than the following to achieve a
working user-space with these changes:
- a sysfs node (may be a
'/sys/devices/system/cpu/addressing-capabilities' node?) or HWCAP
capability export for user-space utilities which perform a live analysis
and use the above variables.
- exporting these variables in vmcoreinfo (for analysis of crash dump).
VA_BITS is already exported in vmcoreinfo, whereas I have proposed
exporting PTRS_PER_PGD to vmcoreinfo via this patch.
For 52-bit kernel VA changes, VA_BITS_ACTUAL will also be needed in
>> Note that the above computation holds true both for PTRS_PER_PGD = 64 (48-bit kernel with
>> 48-bit User VA) and 1024 (48-bit with 52-bit User VA) cases. And these are the
>> configurations for which we are trying to fix the user-space regressions reported (on
>> arm64) recently.
> ... and revisit it when there is another combination?
>>> Today __cpu_setup() sets T0SZ and T1SZ differently for 52bit VA, but in the future it
>>> could set them the same, or different the other-way-round.
>>> Will makedumpfile using this value keep working once T1SZ is 52bit VA too? In this case
>>> there would be no ttbr offset.
>>> If you need another vmcoreinfo flag once that happens, we've done something wrong here.
>> I am currently experimenting with Steve's patches for 52-bit kernel VA
>> (<https://lwn.net/Articles/780093/>) and will comment more on the same when I am able to
>> get the user-space utilities like makedumpfile and kexec-tools to work with the same on
>> both ARMv8 Fast Simulator model and older CPUs which don't support ARMv8.2 extensions.
>> However, I think we should not hold up fixes for regressions already reported, because the
>> 52-bit kernel VA changes probably still need some more rework.
> Chucking things into vmcoreinfo isn't free: we need to keep them there forever, otherwise
> yesterdays version of the tools breaks. Can we take the time to get this right for the
> cases we know about?
Sure, but exporting variable(s) in vmcoreinfo in directly related to the
information variable(s) we add in the kernel side without which the
user-space would break.
I have added the requirements for 52-bit kernel VA above (i.e we need an
additional VA_BITS_ACTUAL variable export'ed rather than any tinkering
with already proposed PTRS_PER_PGD).
May be this is a good time to also talk about minimizing the kernel
interfaces we are proposing to hold and indicate normal (48-bit) and
extended (52-bit) address spaces on arm64.
Ideally, we would want to simplify it further to be on similar lines as x86:
which seems much cleaner..
I am open to any suggestions on the same.
> Yes the kernel code is going to move around, this is why the information we expose via
> vmcoreinfo needs to be thought through: something we would always need, regardless of how
> the kernel implements it.
>>> (Not to mention what happens if the TTBR1_EL1 uses 52bit va, but TTBR0_EL1 doesn't)
>> I am wondering if there are any real users of the above combination.
> Heh! Is there any hardware that supports this?
> Pointer-auth changes all this again, as we may prefer to use the bits for pointer-auth in
> one TTB or the other. PTRS_PER_PGD may show the 52bit value in this case, but neither TTBR
> is mapping 52bits of VA.
>> So far, I have generally come across discussions where the following variations of the
>> address spaces have been proposed/requested:
>> - 48bit kernel VA + 48-bit User VA,
>> - 48-bit kernel VA + 52-bit User VA,
> + 52bit kernel, because there is excessive quantities of memory, and the kernel maps it
> all, but 48-bit user, because it never maps all the memory, and we prefer the bits for
>> - 52-bit kernel VA + 52-bit User VA.
> And... all four may happen with the same built image. I don't see how you can tell these
> cases apart with the one (build-time-constant!) PTRS_PER_PGD value.
> I'm sure some of these cases are hypothetical, but by considering it all now, we can avoid
> three more kernel:vmcoreinfo updates, and three more fix-user-space-to-use-the-new-value.
> I think you probably do need PTRS_PER_PGD, as this is the one value the mm is using to
> generate page tables. I'm pretty sure you also need T0SZ and T1SZ to know if that's
> actually in use, or the kernel is bodging round it with an offset.
Sure, I am open to suggestions (as I realize that we need an additional
VA_BITS_ACTUAL variable export'ed for 52-bit kernel VA changes).
Also how do we standardize reading T0SZ and T1SZ in user-space. Do you
propose I make an enhancement in the cpu-feature-registers interface
(see ) or the HWCAPS interface (see ) for the same?
More information about the kexec