[PATCH v2 2/3] Generic handling of multi-page exclusions

Atsushi Kumagai kumagai-atsushi at mxc.nes.nec.co.jp
Tue Apr 8 02:00:19 PDT 2014


>From: Petr Tesarik <ptesarik at suse.cz>
>Subject: Re: [PATCH v2 2/3] Generic handling of multi-page exclusions
>Date: Tue, 8 Apr 2014 08:54:36 +0200
>
>> On Tue, 08 Apr 2014 10:49:07 +0900 (JST)
>> HATAYAMA Daisuke <d.hatayama at jp.fujitsu.com> wrote:
>>
>>> From: Petr Tesarik <ptesarik at suse.cz>
>>> Subject: [PATCH v2 2/3] Generic handling of multi-page exclusions
>>> Date: Fri, 4 Apr 2014 19:25:08 +0200
>>>
>>> > When multiple pages are excluded from the dump, store the extents in the
>>> > mem_map_data structure, and check if anything is still pending on the
>>> > next invocation of __exclude_unnecessary_pages for the same mem_map.
>>> >
>>> > The start PFN of the excluded extent is set to the end of the current
>>> > cycle (which is equal to the start of the next cycle, see update_cycle),
>>> > so only the part of the excluded region which falls beyond current cycle
>>> > buffer is valid. If the excluded region is completely processed in the
>>> > current cycle, the start PFN is even bigger than the end PFN. That
>>> > causes nothing to be done at the beginning of the next cycle.
>>> >
>>> > There is no check whether the adjusted pfn_start is still within the
>>> > current cycle. Nothing bad happens if it isn't, because exclude_range()
>>> > is used again to exclude the remaining part, so even if the excluded
>>> > region happens to span more than two cycles, the code will still work
>>> > correctly.
>>> >
>>> > Note that clear_bit_on_2nd_bitmap_for_kernel() accepts PFNs outside the
>>> > current cyclic range. It willreturn FALSE, so such PFNs are not counted.
>>> >
>>> > Signed-off-by: Petr Tesarik <ptesarik at suse.cz>
>>> > ---
>>> >  makedumpfile.c | 47 +++++++++++++++++++++++++++++++++--------------
>>> >  makedumpfile.h |  7 +++++++
>>> >  2 files changed, 40 insertions(+), 14 deletions(-)
>>> >
>>> > diff --git a/makedumpfile.c b/makedumpfile.c
>>> > index 81c687b..9ffb901 100644
>>> > --- a/makedumpfile.c
>>> > +++ b/makedumpfile.c
>>> > @@ -2385,6 +2385,9 @@ dump_mem_map(unsigned long long pfn_start,
>>> >  	mmd->pfn_end   = pfn_end;
>>> >  	mmd->mem_map   = mem_map;
>>> >
>>> > +	mmd->exclude_pfn_start = 0ULL;
>>> > +	mmd->exclude_pfn_end   = 0ULL;
>>> > +
>>> >  	DEBUG_MSG("mem_map (%d)\n", num_mm);
>>> >  	DEBUG_MSG("  mem_map    : %lx\n", mem_map);
>>> >  	DEBUG_MSG("  pfn_start  : %llx\n", pfn_start);
>>> > @@ -4657,6 +4660,21 @@ initialize_2nd_bitmap_cyclic(struct cycle *cycle)
>>> >  	return TRUE;
>>> >  }
>>> >
>>> > +static void
>>> > +exclude_range(unsigned long long *counter, struct mem_map_data *mmd,
>>> > +    unsigned long long pfn, unsigned long long endpfn, struct cycle *cycle)
>>> > +{
>>> > +	while (pfn < endpfn) {
>>> > +		if (clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle))
>>> > +			(*counter)++;
>>> > +		++pfn;
>>> > +	}
>>>
>>> Here endpfn is pfn + nr_pages in __exclude_unnecessary_pages(), and
>>> the pfn could be cycle->end_pfn <= pfn < endpfn.
>>>
>>> while (pfn < MIN(endpfn, cycle->end_pfn) {
>>> 	if (clear_bit_on_2nd_bitmap_for_kernel(pfn, cycle))
>>> 		(*counter)++;
>>> 	++pfn;
>>> }
>>
>> This is a non-issue: clear_bitmap_cyclic() checks the extents, and I
>> even mentioned it in the commit message. All right, we can save some
>> loop iterations by moving the check out of the loop body...
>>
>
>Ah, OK, I interpreted somehow your code like:
>
>+	mmd->exclude_pfn_start = cycle ? endpfn: ULONGLONG_MAX;
>
>So there's no logically wrong point.
>
>BTW, hmm, the behaviour of clear_bit_on_2nd_bitmap_for_kernel() is not
>what I expect, that is, I don't expect sanity check in
>set_bitmap_cyclic(), which should have been removed by clean up we did
>some times ago.

At that time, I chose the current code since it was simpler and safer.
http://www.mail-archive.com/kexec%40lists.infradead.org/msg10207.html

Don't you like this ?


Thanks
Atsushi Kumagai

>int
>set_bitmap_cyclic(char *bitmap, unsigned long long pfn, int val, struct cycle *cycle)
>{
>        int byte, bit;
>
>        if (pfn < cycle->start_pfn || cycle->end_pfn <= pfn) <-- this
>                return FALSE;
>
>The code some time ago changes region of cycles in many places and it
>was difficult to figure out. So, in the clean up, we introduced struct
>cycle and for_each_cycle() to obtain invariance that cycle is not
>changed under for_each_cycle().
>
>So, could you fix also this if possible?
>
>>> > +
>>> > +	mmd->exclude_pfn_start = cycle ? cycle->end_pfn : ULONGLONG_MAX;
>>>
>>> When does cycle become NULL?
>>
>> When __exclude_unnecessary_pages() is called from
>> exclude_unnecessary_pages, i.e. non-cyclic.
>>
>>> Along with the above point,
>>>
>>> mmd->exclude_pfn_start = MIN(endpfn, cycle->end_pfn);
>>>
>>> and then we can continue the processing in the next cycle.
>>
>> Again, this is a non-issue. These stored extents are validated before
>> use in __exclude_unnecessary_pages. Why should I check them twice?
>> And by the way, this is also mentioned in the commit message.
>>
>>> > +	mmd->exclude_pfn_end = endpfn;
>>> > +	mmd->exclude_pfn_counter = counter;
>>> > +}
>>> > +
>>> >  int
>>> >  __exclude_unnecessary_pages(struct mem_map_data *mmd, struct cycle *cycle)
>>> >  {
>>> > @@ -4671,6 +4689,18 @@ __exclude_unnecessary_pages(struct mem_map_data *mmd, struct cycle *cycle)
>>> >  	unsigned long flags, mapping, private = 0;
>>> >
>>> >  	/*
>>> > +	 * If a multi-page exclusion is pending, do it first
>>> > +	 */
>>> > +	if (mmd->exclude_pfn_start < mmd->exclude_pfn_end) {
>>> > +		exclude_range(mmd->exclude_pfn_counter, mmd,
>>> > +			mmd->exclude_pfn_start, mmd->exclude_pfn_end,
>>> > +			cycle);
>>> > +
>>> > +		mem_map += (mmd->exclude_pfn_end - pfn_start) * SIZE(page);
>>> > +		pfn_start = mmd->exclude_pfn_end;
>>> > +	}
>>> > +
>>> > +	/*
>>> >  	 * Refresh the buffer of struct page, when changing mem_map.
>>> >  	 */
>>> >  	pfn_read_start = ULONGLONG_MAX;
>>> > @@ -4734,21 +4764,10 @@ __exclude_unnecessary_pages(struct mem_map_data *mmd, struct cycle *cycle)
>>> >  		if ((info->dump_level & DL_EXCLUDE_FREE)
>>> >  		    && info->page_is_buddy
>>> >  		    && info->page_is_buddy(flags, _mapcount, private, _count)) {
>>> > -			int i, nr_pages = 1 << private;
>>> > +			int nr_pages = 1 << private;
>>> > +
>>> > +			exclude_range(&pfn_free, mmd, pfn, pfn + nr_pages, cycle);
>>> >
>>> > -			for (i = 0; i < nr_pages; ++i) {
>>> > -				/*
>>> > -				 * According to combination of
>>> > -				 * MAX_ORDER and size of cyclic
>>> > -				 * buffer, this clearing bit operation
>>> > -				 * can overrun the cyclic buffer.
>>> > -				 *
>>> > -				 * See check_cyclic_buffer_overrun()
>>> > -				 * for the detail.
>>> > -				 */
>>> > -				if (clear_bit_on_2nd_bitmap_for_kernel((pfn + i), cycle))
>>> > -					pfn_free++;
>>> > -			}
>>> >  			pfn += nr_pages - 1;
>>> >  			mem_map += (nr_pages - 1) * SIZE(page);
>>> >  		}
>>> > diff --git a/makedumpfile.h b/makedumpfile.h
>>> > index 951ed1b..dfad569 100644
>>> > --- a/makedumpfile.h
>>> > +++ b/makedumpfile.h
>>> > @@ -816,6 +816,13 @@ struct mem_map_data {
>>> >  	unsigned long long	pfn_start;
>>> >  	unsigned long long	pfn_end;
>>> >  	unsigned long	mem_map;
>>> > +
>>> > +	/*
>>> > +	 * for excluding multi-page regions
>>> > +	 */
>>> > +	unsigned long		exclude_pfn_start;
>>> > +	unsigned long		exclude_pfn_end;
>>>
>>> unsigned long long		exclude_pfn_start;
>>> unsigned long long		exclude_pfn_end;
>>>
>>> The integers representing page frame numbers need to be defined as
>>> unsigned long long for architectures where physical address can have
>>> 64-bit length but unsigned long has 32-bit only, such as x86 PAE.
>>
>> Ouch. My mistake. I thought I covered all places, but somehow I
>> missed this one. I'm going to post a fixed series.
>>
>>> Kumagai-san, I saw this sometimes in the past. How about introducing
>>> specific abstract type for page frame number like below?
>>>
>>> typedef unsigned long long pfn_t;
>>>
>>> maybe with some prefix. I think this also helps code readability
>>> because unsigned long long is too long.
>>>
>>> > +	unsigned long long	*exclude_pfn_counter;
>>> >  };
>>>
>>> Also, it seems to me better to introduce a new type for this
>>> processing rather than extending existing code. struct mem_map_data is
>>> not specific for the excluding processing.
>>
>> Kind of agreed. OTOH it will most likely be embedded in struct
>> mem_map_data anyway, because exactly one such object per mm is needed.
>>
>> Petr T
>
>I don't understand well. It seems to me a single object is enough. Is
>it possible to nr_pages cover multiple mm's?
>
>Thanks.
>HATAYAMA, Daisuke




More information about the kexec mailing list