[RFC][nvdimm][crash] pmem memmap dump support
Baoquan He
bhe at redhat.com
Wed Mar 1 00:17:25 PST 2023
On 03/01/23 at 06:27am, lizhijian at fujitsu.com wrote:
......
> Hi Baoquan
>
> Greatly appreciate your feedback.
>
>
> > 1) In kernel side, export info of pmem meta data;
> > 2) in makedumpfile size, add an option to specify if we want to dump
> > pmem meta data; An option or in dump level?
>
> Yes, I'm working on these 2 step.
>
> > 3) In glue script, detect and warn if pmem data is in pmem and wanted,
> > and dump target is the same pmem.
> >
>
> The 'glue script' means the scirpt like '/usr/bin/kdump.sh' in 2nd kernel? That would be an option,
> Shall we abort this dump if "pmem data is in pmem and wanted, and dump target is the same pmem" ?
Guess you are saying scripts in RHEL/centos/fedora, and yes if I guess
righ. Other distros could have different scripts. For kdump, we need
load kdump kernel/initramfs in advance, then wait to capture any crash.
When we load, we can detect and check whether the environment and
setup is expected. If not, we can warn or error out message to users.
We don't need to do the checking until crash is triggered, then decide
to abort the dump or not.
> > Does this work for you?
> >
> > Not sure if above items are all do-able. As for parking pmem device
> > till in kdump kernel, I believe intel pmem expert know how to achieve
> > that. If there's no way to park pmem during kdump jumping, case D) is
> > daydream.
>
> What's "kdump jumping" timing here ?
> A. 1st kernel crashed and jumping to 2nd kernel or
> B. 2nd/kdump kernel do the dump operation.
>
> In my understanding, dumping application(makedumpfile) in kdump kernel will do the dump operation
> after modules loaded. Does "parking pmem" mean to postpone pmem modules loading until dump
> operation finished ? if so, i think it has the same effect with disabling pmem device in kdump kernel.
I used parking which should be wrong. When crash happened, we currently
only shutdown unrelated CPU and interupt controller, but keep other
devices on-flight. This is why we can preserve the content of crash-ed
kernel's memory. For normal memory device, we reserve small part as
crashkernel to run kdump kernel and dumping, keep the 1st kernel's
memory untouched. For pmem, we may need to do something similar to keep
its content untouched. I am not sure if disabling pmem device is the
thing we need do in kdump kernel, what we want is
1) not shutdown pmem in 1st kernel when crash-ed
2) do not re-initialize pmem, at least do not remove its content
1) has been there with the current handling. We need do something to
guarantee 2)? I don't know pmem well, just personal thought.
More information about the kexec
mailing list