[PATCH v4] arm64: mm: fix linear mem mapping access performance degradation
guanghui.fgh
guanghuifeng at linux.alibaba.com
Mon Jul 4 03:58:20 PDT 2022
在 2022/7/4 18:35, Will Deacon 写道:
> On Sat, Jul 02, 2022 at 11:57:53PM +0800, Guanghui Feng wrote:
>> The arm64 can build 2M/1G block/sectiion mapping. When using DMA/DMA32 zone
>> (enable crashkernel, disable rodata full, disable kfence), the mem_map will
>> use non block/section mapping(for crashkernel requires to shrink the region
>> in page granularity). But it will degrade performance when doing larging
>> continuous mem access in kernel(memcpy/memmove, etc).
>
> Hmm. It seems a bit silly to me that we take special care to unmap the
> crashkernel from the linear map even when can_set_direct_map() is false, as
> we won't be protecting the main kernel at all!
>
> Why don't we just leave the crashkernel mapped if !can_set_direct_map()
> and then this problem just goes away?
>
> Will
This question had been asked lask week.
1.Quoted messages from arch/arm64/mm/init.c
"Memory reservation for crash kernel either done early or deferred
depending on DMA memory zones configs (ZONE_DMA) --
In absence of ZONE_DMA configs arm64_dma_phys_limit initialized
here instead of max_zone_phys(). This lets early reservation of
crash kernel memory which has a dependency on arm64_dma_phys_limit.
Reserving memory early for crash kernel allows linear creation of block
mappings (greater than page-granularity) for all the memory bank rangs.
In this scheme a comparatively quicker boot is observed.
If ZONE_DMA configs are defined, crash kernel memory reservation
is delayed until DMA zone memory range size initialization performed in
zone_sizes_init(). The defer is necessary to steer clear of DMA zone
memory range to avoid overlap allocation.
[[[
So crash kernel memory boundaries are not known when mapping all bank
memory ranges, which otherwise means not possible to exclude crash
kernel range from creating block mappings so page-granularity mappings
are created for the entire memory range.
]]]"
Namely, the init order: memblock init--->linear mem mapping(4k mapping
for crashkernel, requirinig page-granularity changing))--->zone dma
limit--->reserve crashkernel.
So when enable ZONE DMA and using crashkernel, the mem mapping using 4k
mapping.
2.As mentioned above, when linear mem use 4k mapping simply, there is
high dtlb miss(degrade performance).
This patch use block/section mapping as far as possible with performance
improvement.
3.This patch reserve crashkernel as same as the history(ZONE DMA &
crashkernel reserving order), and only change the linear mem mapping to
block/section mapping.
Init order: memblock init--->linear mem mapping(block/section mapping
for linear mem mapping))--->zone dma limit--->reserve
crashkernel--->[[[only]]] rebuild 4k pagesize mapping for crashkernel mem
With this method, there will use block/section mapping as far as possible.
More information about the linux-arm-kernel
mailing list