[PATCH] makedumpfile: readpage_elf: handle 0-pages not stored in the ELF file
Atsushi Kumagai
ats-kumagai at wm.jp.nec.com
Sun Jan 31 22:48:13 PST 2016
Hello Petr,
> handle 0-pages not stored in the ELF file
>
>Resent to the list.
>
>The list does not like attachments, it seems.
>
>Petr T
>
>On Wed, 27 Jan 2016 08:58:21 +0100
>Petr Tesarik <ptesarik at suse.cz> wrote:
>
>> Hello Atsushi-san,
>>
>> On Wed, 27 Jan 2016 02:21:57 +0000
>> Atsushi Kumagai <ats-kumagai at wm.jp.nec.com> wrote:
>>
>> > Hello Ivan,
>> >
>> > >Handle pages filled with zeros that are not stored in the ELF file
>> > >directly, but have addresses between p_filesz and p_memsz of a segment.
>> >
>> > This fix looks reasonable, thanks for your work.
>> >
>> > >This allows makedumpfile to dump-dmesg from a previously makedumpfile-ed
>> > >vmcore where all 0-pages were excluded (dump_level & 1) for example.
>> >
>> > I have some comments below:
>>
>> Actually, I have already hit another issue with filtered ELF files.
>> When refiltering an ELF file that had already been filtered, the memory
>> size vs. file size can cause wrong max_mapnr calculation for a
>> compressed file. In turn, if some pages were removed at the end of
>> physical memory, those pages are not marked as filtered in the output
>> file, but instead the total RAM appears to be smaller by that amount.
>>
>> I don't think my patch addresses the same issue that Ivan found, but it
>> definitely touches the same area. Please have a look.
Good catch, thanks for your report.
>> Petr Tesarik
>
>From: Petr Tesarik <ptesarik at suse.cz>
>Date: Thu Aug 20 10:43:22 2015 +0200
>Subject: Keep segment memory size when re-filtering ELF dumps
>
>Originally, makedumpfile was designed to read from /proc/vmcore, where
>each segment's p_memsz is equal to its p_filesz. However, makedumpfile
>can also be used to re-filter an already filtered ELF dump file, where
>memory size may be larger than file size. In that case the memory size
>should be used as the size of the segment. This affects:
Does this problem occur only if makedumpfile has done filtering ?
According to the man 5 elf, even the original ELF file can have
"unstored zero pages".
PT_LOAD The array element specifies a loadable segment, described by p_filesz and p_memsz.
The bytes from the file are mapped to the beginning of the memory segment. If the
segment's memory size p_memsz is larger than the file size p_filesz, the "extra" bytes
are defined to hold the value 0 and to follow the segment's initialized area.
If unstored pages will be made only by makedumpfile, what I said
Below has no meaning.
>1. max_mapnr
> This value is computed as the highest phys_end. If the last segment
> was filtered, max_mapnr may be too small, and the crash utility will
> refuse to read that memory (even with --zero_excluded).
Yes, should be computed based on p_memsz.
>2. p_memsz field in ELF dumps
> The resulting ELF segment p_memsz will be capped to original file's
> p_filesz, ignoring the original p_memsz.
Yes, should be fixed.
>3. memory holes in KDUMP dumps
> Pages excluded in the original ELF dump will be appear as memory
> holes in the resulting KDUMP file's first bitmap.
a. If an unstored page is a just zero page, it is neither on a memory hole
nor a filtered page.
b. If an unstored page is the result of makedumpfile filtering, it should be
handled as a filtered page.
However, I think it's impossible to distinguish whether former or latter
after filtering.
>4. vtop translation
> Virtual addresses that were filtered out in the original ELF file
> cannot be translated to physical addresses using the generic vtop
> translation.
Exactly, should refer to p_memsz for it as you did.
>My fix uses p_memsz to set physical and virtual extents of ELF segments,
>because this is the actual size. However, file size is important when
>accessing page data, so it must be stored separately and checked where
>appropriate. Last but not least, pages that were filtered in the original
>dump file must also be filtered in the destination file, i.e. pages
>between p_filesz and p_memsz must have their corresponding bit cleared in
>the 2nd bitmap.
As I said above, I suspect not all of unstored pages are filtered pages,
I'm not sure exclude_nodata_pages() does right things.
As Ivan's patch does, I guess reading them as zero pages fits ELF's format
specification.
Thanks,
Atsushi Kumagai
>Signed-off-by: Petr Tesarik <ptesarik at suse.com>
>
>---
> elf_info.c | 37 ++++++++++++++++++++++++++++++-------
> elf_info.h | 4 ++++
> makedumpfile.c | 26 ++++++++++++++++++++++++++
> 3 files changed, 60 insertions(+), 7 deletions(-)
>
>--- a/elf_info.c
>+++ b/elf_info.c
>@@ -38,6 +38,7 @@
>
> struct pt_load_segment {
> off_t file_offset;
>+ off_t file_size;
> unsigned long long phys_start;
> unsigned long long phys_end;
> unsigned long long virt_start;
>@@ -162,10 +163,11 @@ dump_Elf_load(Elf64_Phdr *prog, int num_
>
> pls = &pt_loads[num_load];
> pls->phys_start = prog->p_paddr;
>- pls->phys_end = pls->phys_start + prog->p_filesz;
>+ pls->phys_end = pls->phys_start + prog->p_memsz;
> pls->virt_start = prog->p_vaddr;
>- pls->virt_end = pls->virt_start + prog->p_filesz;
>+ pls->virt_end = pls->virt_start + prog->p_memsz;
> pls->file_offset = prog->p_offset;
>+ pls->file_size = prog->p_filesz;
>
> DEBUG_MSG("LOAD (%d)\n", num_load);
> DEBUG_MSG(" phys_start : %llx\n", pls->phys_start);
>@@ -462,7 +464,7 @@ paddr_to_offset(unsigned long long paddr
> for (i = offset = 0; i < num_pt_loads; i++) {
> pls = &pt_loads[i];
> if ((paddr >= pls->phys_start)
>- && (paddr < pls->phys_end)) {
>+ && (paddr < pls->phys_start + pls->file_size)) {
> offset = (off_t)(paddr - pls->phys_start) +
> pls->file_offset;
> break;
>@@ -480,16 +482,14 @@ paddr_to_offset2(unsigned long long padd
> {
> int i;
> off_t offset;
>- unsigned long long len;
> struct pt_load_segment *pls;
>
> for (i = offset = 0; i < num_pt_loads; i++) {
> pls = &pt_loads[i];
>- len = pls->phys_end - pls->phys_start;
> if ((paddr >= pls->phys_start)
>- && (paddr < pls->phys_end)
>+ && (paddr < pls->phys_start + pls->file_size)
> && (hint >= pls->file_offset)
>- && (hint < pls->file_offset + len)) {
>+ && (hint < pls->file_offset + pls->file_size)) {
> offset = (off_t)(paddr - pls->phys_start) +
> pls->file_offset;
> break;
>@@ -1095,6 +1095,29 @@ get_pt_load(int idx,
>
> return TRUE;
> }
>+
>+int
>+get_pt_extents(int idx,
>+ unsigned long long *phys_start,
>+ unsigned long long *phys_end,
>+ off_t *file_size)
>+{
>+ struct pt_load_segment *pls;
>+
>+ if (num_pt_loads <= idx)
>+ return FALSE;
>+
>+ pls = &pt_loads[idx];
>+
>+ if (phys_start)
>+ *phys_start = pls->phys_start;
>+ if (phys_end)
>+ *phys_end = pls->phys_end;
>+ if (file_size)
>+ *file_size = pls->file_size;
>+
>+ return TRUE;
>+}
>
> unsigned int
> get_num_pt_loads(void)
>--- a/elf_info.h
>+++ b/elf_info.h
>@@ -61,6 +61,10 @@ int get_pt_load(int idx,
> unsigned long long *phys_end,
> unsigned long long *virt_start,
> unsigned long long *virt_end);
>+int get_pt_extents(int idx,
>+ unsigned long long *phys_start,
>+ unsigned long long *phys_end,
>+ off_t *file_size);
> unsigned int get_num_pt_loads(void);
>
> void set_nr_cpus(int num);
>--- a/makedumpfile.c
>+++ b/makedumpfile.c
>@@ -4395,6 +4395,26 @@ read_start_flat_header(void)
> return TRUE;
> }
>
>+static void
>+exclude_nodata_pages(void)
>+{
>+ int i;
>+ unsigned long long phys_start, phys_end;
>+ off_t file_size;
>+
>+ i = 0;
>+ while (get_pt_extents(i, &phys_start, &phys_end, &file_size)) {
>+ unsigned long long pfn, pfn_end;
>+
>+ pfn = paddr_to_pfn(phys_start + file_size);
>+ pfn_end = paddr_to_pfn(phys_end);
>+ while (pfn <= pfn_end)
>+ clear_bit_on_2nd_bitmap(pfn);
>+ ++pfn;
>+ ++i;
>+ }
>+}
>+
> int
> read_flat_data_header(struct makedumpfile_data_header *fdh)
> {
>@@ -6087,6 +6107,12 @@ create_2nd_bitmap(struct cycle *cycle)
> }
>
> /*
>+ * If re-filtering ELF dump, exclude pages that were already
>+ * excluded in the original file.
>+ */
>+ exclude_nodata_pages();
>+
>+ /*
> * Exclude cache pages, cache private pages, user data pages,
> * and hwpoison pages.
> */
>
>_______________________________________________
>kexec mailing list
>kexec at lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/kexec
More information about the kexec
mailing list