[RFC] arm64: mm: update max_pfn after memory hotplug
David Hildenbrand
david at redhat.com
Fri Sep 24 01:17:46 PDT 2021
On 24.09.21 04:47, Florian Fainelli wrote:
>
>
> On 9/23/2021 3:54 PM, Chris Goldsworthy wrote:
>> From: Sudarshan Rajagopalan <quic_sudaraja at quicinc.com>
>>
>> After new memory blocks have been hotplugged, max_pfn and max_low_pfn
>> needs updating to reflect on new PFNs being hot added to system.
>>
>> Signed-off-by: Sudarshan Rajagopalan <quic_sudaraja at quicinc.com>
>> Signed-off-by: Chris Goldsworthy <quic_cgoldswo at quicinc.com>
>> ---
>> arch/arm64/mm/mmu.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index cfd9deb..fd85b51 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -1499,6 +1499,11 @@ int arch_add_memory(int nid, u64 start, u64 size,
>> if (ret)
>> __remove_pgd_mapping(swapper_pg_dir,
>> __phys_to_virt(start), size);
>> + else {
>> + max_pfn = PFN_UP(start + size);
>> + max_low_pfn = max_pfn;
>> + }
>
> This is a drive by review, but it got me thinking about your changes a bit:
>
> - if you raise max_pfn when you hotplug memory, don't you need to lower
> it when you hot unplug memory as well?
The issue with lowering is that you actually have to do some search to
figure out the actual value -- and it's not really worth the trouble.
Raising the limit is easy.
With memory hotunplug, anybody wanting to take a look at a "struct page"
via a pfn has to do a pfn_to_online_page() either way. That will fail if
there isn't actually a memmap anymore because the memory has been
unplugged. So "max_pfn" is actually rather a hint what maximum pfn to
look at, and it can be bigger than it actually is.
The a look at the example usage in fs/proc/page.c:kpageflags_read()
pfn_to_online_page() will simply fail and stable_page_flags() will
indicate a KPF_NOPAGE.
Just like we would have a big memory hole now at the end of memory.
>
> - suppose that you have a platform which maps physical memory into the
> CPU's address space at 0x00_4000_0000 (1GB offset) and the kernel boots
> with 2GB of DRAM plugged by default. At that point we have not
> registered a swiotlb because we have less than 4GB of addressable
> physical memory, there is no IOMMU in that system, it's a happy world.
> Now assume that we plug an additional 2GB of DRAM into that system
> adjacent to the previous 2GB, from 0x00_C0000_0000 through
> 0x14_0000_0000, now we have physical addresses above 4GB, but we still
> don't have a swiotlb, some of our DMA_BIT_MASK(32) peripherals are going
> to be unable to DMA from that hot plugged memory, but they could if we
> had a swiotlb.
That's why platforms that hotplug memory should indicate the maximum
possible PFN via some mechanism during boot. On x86-64 (and IIRC also
arm64 now), this is done via the ACPI SRAT.
And that's where "max_possible_pfn" and "max_pfn" differ. See
drivers/acpi/numa/srat.c:acpi_numa_memory_affinity_init():
max_possible_pfn = max(max_possible_pfn, PFN_UP(end - 1));$
Using max_possible_pfn, the OS can properly setup the swiotlb, even
thought it wouldn't currently be required when just looking at max_pfn.
I documented that for virtio-mem in
https://virtio-mem.gitlab.io/user-guide/user-guide-linux.html
"swiotlb and DMA memory".
>
> - now let's go even further but this is very contrived. Assume that the
> firmware has somewhat created a reserved memory region with a 'no-map'
> attribute thus indicating it does not want a struct page to be created
> for a specific PFN range, is it valid to "blindly" raise max_pfn if that
> region were to be at the end of the just hot-plugged memory?
no-map means that no direct mapping is to be created, right? We would
still have a memmap IIRC, and the pages are PG_reserved.
Again, I think this is very similar to just having no-map regions like
random memory holes within the existing memory layout.
What Chris proposes here is very similar to
arch/x86/mm/init_64.c:update_end_of_memory_vars() called during
arch_add_memory()->add_pages() on x86-64.
--
Thanks,
David / dhildenb
More information about the linux-arm-kernel
mailing list