[PATCH -next] crash: Fix riscv64 crash memory reserve dead loop
Petr Tesařík
petr at tesarici.cz
Tue Aug 13 06:33:52 PDT 2024
On Tue, 13 Aug 2024 13:04:31 +0100
Catalin Marinas <catalin.marinas at arm.com> wrote:
> Hi Petr,
>
> On Tue, Aug 13, 2024 at 10:40:06AM +0200, Petr Tesařík wrote:
> > On Tue, 6 Aug 2024 20:34:42 +0100
> > Catalin Marinas <catalin.marinas at arm.com> wrote:
> > > I haven't tried but it's possible that this patch also breaks those
> > > arm64 platforms with all RAM above 4GB when CRASH_ADDR_LOW_MAX is
> > > memblock_end_of_DRAM(). Here all memory would be low and in the absence
> > > of no fallback, it fails to allocate.
> >
> > I'm afraid you've just opened a Pandora box... ;-)
>
> Not that bad ;) but, yeah, this patch was dropped in favour of this:
>
> https://lore.kernel.org/r/20240812062017.2674441-1-ruanjinjie@huawei.com/
Yes, I have noticed. That one simply preserves the status quo and a
fuzzy definition of "low".
> > Another (unrelated) patch series made us aware of a platforms where RAM
> > starts at 32G, but IIUC the host bridge maps 32G-33G to bus addresses
> > 0-1G, and there is a device on that bus which can produce only 30-bit
> > addresses.
> >
> > Now, what was the idea behind allocating some crash memory "low"?
> > Right, it should allow the crash kernel to access devices with
> > addressing constraints. So, on the above-mentioned platform, allocating
> > "low" would in fact mean allocating between 32G and 33G (in host address
> > domain).
>
> Indeed. If that's not available, the crash kernel won't be able to boot
> (unless the corresponding device is removed from DT or ACPI tables).
Then it may be able to boot, but it won't be able to save a crash dump
on disk or send it over the network, rendering the panic kernel
environment a bit less useful.
> > Should we rethink the whole concept of high/low?
>
> Yeah, it would be good to revisit those at some point. For the time
> being, 'low' in this context on arm64 means ZONE_DMA memory, basically
> the common denominator address range that supports all devices on an
> SoC. For others like x86_32, this means the memory that the kernel can
> actually map (not necessarily device/DMA related).
Ah, right. I forgot that there are also constraints on the placement of
the kernel identity mapping in CPU physical address space.
> So, it's not always about the DMA capabilities but also what the crash
> kernel can map (so somewhat different from the zone allocator case we've
> been discussing in other threads).
It seems to me that a good panic kernel environment requires:
1. memory where kernel text/data can be mapped (even at early init)
2. memory that is accessible to I/O devices
3. memory that can be allocated to user space (e.g. makedumpfile)
The first two blocks may require special placement in bus/CPU physical
address space, the third does not, but it needs to be big enough for
the workload.
I'll try to transform this knowledge into something actionable or even
reviewable.
For now, I agree there's nothing more to discuss.
Thanks
Petr T
More information about the linux-riscv
mailing list