[RFC][nvdimm][crash] pmem memmap dump support
Baoquan He
bhe at redhat.com
Tue Feb 28 06:03:49 PST 2023
On 02/23/23 at 06:24am, lizhijian at fujitsu.com wrote:
> Hello folks,
>
> This mail raises a pmem memmap dump requirement and possible solutions, but they are all still premature.
> I really hope you can provide some feedback.
>
> pmem memmap can also be called pmem metadata here.
>
> ### Background and motivate overview ###
> ---
> Crash dump is an important feature for trouble shooting of kernel. It is the final way to chase what
> happened at the kernel panic, slowdown, and so on. It is the most important tool for customer support.
> However, a part of data on pmem is not included in crash dump, it may cause difficulty to analyze
> trouble around pmem (especially Filesystem-DAX).
>
>
> A pmem namespace in "fsdax" or "devdax" mode requires allocation of per-page metadata[1]. The allocation
> can be drawn from either mem(system memory) or dev(pmem device), see `ndctl help create-namespace` for
> more details. In fsdax, struct page array becomes very important, it is one of the key data to find
> status of reverse map.
>
> So, when metadata was stored in pmem, even pmem's per-page metadata will not be dumped. That means
> troubleshooters are unable to check more details about pmem from the dumpfile.
>
> ### Make pmem memmap dump support ###
> ---
> Our goal is that whether metadata is stored on mem or pmem, its metadata can be dumped and then the
> crash-utilities can read more details about the pmem. Of course, this feature can be enabled/disabled.
>
> First, based on our previous investigation, according to the location of metadata and the scope of
> dump, we can divide it into the following four cases: A, B, C, D.
> It should be noted that although we mentioned case A&B below, we do not want these two cases to be
> part of this feature, because dumping the entire pmem will consume a lot of space, and more importantly,
> it may contain user sensitive data.
>
> +-------------+----------+------------+
> |\+--------+\ metadata location |
> | ++-----------------------+
> | dump scope | mem | PMEM |
> +-------------+----------+------------+
> | entire pmem | A | B |
> +-------------+----------+------------+
> | metadata | C | D |
> +-------------+----------+------------+
>
> Case A&B: unsupported
> - Only the regions listed in PT_LOAD in vmcore are dumpable. This can be resolved by adding the pmem
> region into vmcore's PT_LOADs in kexec-tools.
> - For makedumpfile which will assume that all page objects of the entire region described in PT_LOADs
> are readable, and then skips/excludes the specific page according to its attributes. But in the case
> of pmem, 1st kernel only allocates page objects for the namespaces of pmem, so makedumpfile will throw
> errors[2] when specific -d options are specified.
> Accordingly, we should make makedumpfile to ignore these errors if it's pmem region.
>
> Because these above cases are not in our goal, we must consider how to prevent the data part of pmem
> from reading by the dump application(makedumpfile).
>
> Case C: native supported
> metadata is stored in mem, and the entire mem/ram is dumpable.
>
> Case D: unsupported && need your input
> To support this situation, the makedumpfile needs to know the location of metadata for each pmem
> namespace and the address and size of metadata in the pmem [start, end)
>
> We have thought of a few possible options:
>
> 1) In the 2nd kernel, with the help of the information from /sys/bus/nd/devices/{namespaceX.Y, daxX.Y, pfnX.Y}
> exported by pmem drivers, makedumpfile is able to calculate the address and size of metadata
> 2) In the 1st kernel, add a new symbol to the vmcore. The symbol is associated with the layout of
> each namespace. The makedumpfile reads the symbol and figures out the address and size of the metadata.
> 3) others ?
>
> But then we found that we have always ignored a user case, that is, the user could save the dumpfile
> to the pmem. Neither of these two options can solve this problem, because the pmem drivers will
> re-initialize the metadata during the pmem drivers loading process, which leads to the metadata
> we dumped is inconsistent with the metadata at the moment of the crash happening.
> Simply, can we just disable the pmem directly in 2nd kernel so that previous metadata will not be
> destroyed? But this operation will bring us inconvenience that 2nd kernel doesn’t allow user storing
> dumpfile on the filesystem/partition based on pmem.
1) In kernel side, export info of pmem meta data;
2) in makedumpfile size, add an option to specify if we want to dump
pmem meta data; An option or in dump level?
3) In glue script, detect and warn if pmem data is in pmem and wanted,
and dump target is the same pmem.
Does this work for you?
Not sure if above items are all do-able. As for parking pmem device
till in kdump kernel, I believe intel pmem expert know how to achieve
that. If there's no way to park pmem during kdump jumping, case D) is
daydream.
Thanks
Baoquan
More information about the kexec
mailing list