[PATCH] makedumpfile: readpage_elf: handle 0-pages not stored in the ELF file

Atsushi Kumagai ats-kumagai at wm.jp.nec.com
Sun Jan 31 22:48:13 PST 2016


Hello Petr,

> handle 0-pages not stored in the ELF file
>
>Resent to the list.
>
>The list does not like attachments, it seems.
>
>Petr T
>
>On Wed, 27 Jan 2016 08:58:21 +0100
>Petr Tesarik <ptesarik at suse.cz> wrote:
>
>> Hello Atsushi-san,
>>
>> On Wed, 27 Jan 2016 02:21:57 +0000
>> Atsushi Kumagai <ats-kumagai at wm.jp.nec.com> wrote:
>>
>> > Hello Ivan,
>> >
>> > >Handle pages filled with zeros that are not stored in the ELF file
>> > >directly, but have addresses between p_filesz and p_memsz of a segment.
>> >
>> > This fix looks reasonable, thanks for your work.
>> >
>> > >This allows makedumpfile to dump-dmesg from a previously makedumpfile-ed
>> > >vmcore where all 0-pages were excluded (dump_level & 1) for example.
>> >
>> > I have some comments below:
>>
>> Actually, I have already hit another issue with filtered ELF files.
>> When refiltering an ELF file that had already been filtered, the memory
>> size vs. file size can cause wrong max_mapnr calculation for a
>> compressed file. In turn, if some pages were removed at the end of
>> physical memory, those pages are not marked as filtered in the output
>> file, but instead the total RAM appears to be smaller by that amount.
>>
>> I don't think my patch addresses the same issue that Ivan found, but it
>> definitely touches the same area. Please have a look.

Good catch, thanks for your report.

>> Petr Tesarik
>
>From: Petr Tesarik <ptesarik at suse.cz>
>Date: Thu Aug 20 10:43:22 2015 +0200
>Subject: Keep segment memory size when re-filtering ELF dumps
>
>Originally, makedumpfile was designed to read from /proc/vmcore, where
>each segment's p_memsz is equal to its p_filesz. However, makedumpfile
>can also be used to re-filter an already filtered ELF dump file, where
>memory size may be larger than file size. In that case the memory size
>should be used as the size of the segment. This affects:

Does this problem occur only if makedumpfile has done filtering ?
According to the man 5 elf, even the original ELF file can have
"unstored zero pages".

   PT_LOAD     The array element specifies a loadable segment, described  by  p_filesz  and  p_memsz.
               The  bytes  from  the  file are mapped to the beginning of the memory segment.  If the
               segment's memory size p_memsz is larger than the file size p_filesz, the "extra" bytes
               are  defined  to  hold  the value 0 and to follow the segment's initialized area.

If unstored pages will be made only by makedumpfile, what I said
Below has no meaning.

>1. max_mapnr
>   This value is computed as the highest phys_end. If the last segment
>   was filtered, max_mapnr may be too small, and the crash utility will
>   refuse to read that memory (even with --zero_excluded).

Yes, should be computed based on p_memsz.

>2. p_memsz field in ELF dumps
>   The resulting ELF segment p_memsz will be capped to original file's
>   p_filesz, ignoring the original p_memsz.

Yes, should be fixed.

>3. memory holes in KDUMP dumps
>   Pages excluded in the original ELF dump will be appear as memory
>   holes in the resulting KDUMP file's first bitmap.

 a. If an unstored page is a just zero page, it is neither on a memory hole
    nor a filtered page. 
 b. If an unstored page is the result of makedumpfile filtering, it should be
    handled as a filtered page. 

However, I think it's impossible to distinguish whether former or latter
after filtering.

>4. vtop translation
>   Virtual addresses that were filtered out in the original ELF file
>   cannot be translated to physical addresses using the generic vtop
>   translation.

Exactly, should refer to p_memsz for it as you did.

>My fix uses p_memsz to set physical and virtual extents of ELF segments,
>because this is the actual size. However, file size is important when
>accessing page data, so it must be stored separately and checked where
>appropriate. Last but not least, pages that were filtered in the original
>dump file must also be filtered in the destination file, i.e. pages
>between p_filesz and p_memsz must have their corresponding bit cleared in
>the 2nd bitmap.

As I said above, I suspect not all of unstored pages are filtered pages,
I'm not sure exclude_nodata_pages() does right things.
As Ivan's patch does, I guess reading them as zero pages fits ELF's format
specification.


Thanks,
Atsushi Kumagai

>Signed-off-by: Petr Tesarik <ptesarik at suse.com>
>
>---
> elf_info.c     |   37 ++++++++++++++++++++++++++++++-------
> elf_info.h     |    4 ++++
> makedumpfile.c |   26 ++++++++++++++++++++++++++
> 3 files changed, 60 insertions(+), 7 deletions(-)
>
>--- a/elf_info.c
>+++ b/elf_info.c
>@@ -38,6 +38,7 @@
>
> struct pt_load_segment {
> 	off_t			file_offset;
>+	off_t			file_size;
> 	unsigned long long	phys_start;
> 	unsigned long long	phys_end;
> 	unsigned long long	virt_start;
>@@ -162,10 +163,11 @@ dump_Elf_load(Elf64_Phdr *prog, int num_
>
> 	pls = &pt_loads[num_load];
> 	pls->phys_start  = prog->p_paddr;
>-	pls->phys_end    = pls->phys_start + prog->p_filesz;
>+	pls->phys_end    = pls->phys_start + prog->p_memsz;
> 	pls->virt_start  = prog->p_vaddr;
>-	pls->virt_end    = pls->virt_start + prog->p_filesz;
>+	pls->virt_end    = pls->virt_start + prog->p_memsz;
> 	pls->file_offset = prog->p_offset;
>+	pls->file_size   = prog->p_filesz;
>
> 	DEBUG_MSG("LOAD (%d)\n", num_load);
> 	DEBUG_MSG("  phys_start : %llx\n", pls->phys_start);
>@@ -462,7 +464,7 @@ paddr_to_offset(unsigned long long paddr
> 	for (i = offset = 0; i < num_pt_loads; i++) {
> 		pls = &pt_loads[i];
> 		if ((paddr >= pls->phys_start)
>-		    && (paddr < pls->phys_end)) {
>+		    && (paddr < pls->phys_start + pls->file_size)) {
> 			offset = (off_t)(paddr - pls->phys_start) +
> 				pls->file_offset;
> 			break;
>@@ -480,16 +482,14 @@ paddr_to_offset2(unsigned long long padd
> {
> 	int i;
> 	off_t offset;
>-	unsigned long long len;
> 	struct pt_load_segment *pls;
>
> 	for (i = offset = 0; i < num_pt_loads; i++) {
> 		pls = &pt_loads[i];
>-		len = pls->phys_end - pls->phys_start;
> 		if ((paddr >= pls->phys_start)
>-		    && (paddr < pls->phys_end)
>+		    && (paddr < pls->phys_start + pls->file_size)
> 		    && (hint >= pls->file_offset)
>-		    && (hint < pls->file_offset + len)) {
>+		    && (hint < pls->file_offset + pls->file_size)) {
> 			offset = (off_t)(paddr - pls->phys_start) +
> 				pls->file_offset;
> 			break;
>@@ -1095,6 +1095,29 @@ get_pt_load(int idx,
>
> 	return TRUE;
> }
>+
>+int
>+get_pt_extents(int idx,
>+	unsigned long long *phys_start,
>+	unsigned long long *phys_end,
>+	off_t *file_size)
>+{
>+	struct pt_load_segment *pls;
>+
>+	if (num_pt_loads <= idx)
>+		return FALSE;
>+
>+	pls = &pt_loads[idx];
>+
>+	if (phys_start)
>+		*phys_start = pls->phys_start;
>+	if (phys_end)
>+		*phys_end   = pls->phys_end;
>+	if (file_size)
>+		*file_size = pls->file_size;
>+
>+	return TRUE;
>+}
>
> unsigned int
> get_num_pt_loads(void)
>--- a/elf_info.h
>+++ b/elf_info.h
>@@ -61,6 +61,10 @@ int get_pt_load(int idx,
> 	unsigned long long *phys_end,
> 	unsigned long long *virt_start,
> 	unsigned long long *virt_end);
>+int get_pt_extents(int idx,
>+	unsigned long long *phys_start,
>+	unsigned long long *phys_end,
>+	off_t *file_size);
> unsigned int get_num_pt_loads(void);
>
> void set_nr_cpus(int num);
>--- a/makedumpfile.c
>+++ b/makedumpfile.c
>@@ -4395,6 +4395,26 @@ read_start_flat_header(void)
> 	return TRUE;
> }
>
>+static void
>+exclude_nodata_pages(void)
>+{
>+	int i;
>+	unsigned long long phys_start, phys_end;
>+	off_t file_size;
>+
>+	i = 0;
>+	while (get_pt_extents(i, &phys_start, &phys_end, &file_size)) {
>+		unsigned long long pfn, pfn_end;
>+
>+		pfn = paddr_to_pfn(phys_start + file_size);
>+		pfn_end = paddr_to_pfn(phys_end);
>+		while (pfn <= pfn_end)
>+			clear_bit_on_2nd_bitmap(pfn);
>+			++pfn;
>+		++i;
>+	}
>+}
>+
> int
> read_flat_data_header(struct makedumpfile_data_header *fdh)
> {
>@@ -6087,6 +6107,12 @@ create_2nd_bitmap(struct cycle *cycle)
> 	}
>
> 	/*
>+	 * If re-filtering ELF dump, exclude pages that were already
>+	 * excluded in the original file.
>+	 */
>+	exclude_nodata_pages();
>+
>+	/*
> 	 * Exclude cache pages, cache private pages, user data pages,
> 	 * and hwpoison pages.
> 	 */
>
>_______________________________________________
>kexec mailing list
>kexec at lists.infradead.org
>http://lists.infradead.org/mailman/listinfo/kexec



More information about the kexec mailing list