[PATCH] makedumpfile: Petr Tesarik's hugepage filtering

HATAYAMA Daisuke d.hatayama at jp.fujitsu.com
Thu May 23 22:24:23 EDT 2013


(2013/05/24 10:11), Atsushi Kumagai wrote:
> On Wed, 15 May 2013 13:43:59 -0500
> Cliff Wickman <cpw at sgi.com> wrote:

>>   	/*
>>   	 * The bitmap buffer is not dirty, and it is not necessary
>>   	 * to write out it.
>> @@ -4345,7 +4386,7 @@ __exclude_unnecessary_pages(unsigned lon
>>   		 * Exclude the data page of the user process.
>>   		 */
>>   		else if ((info->dump_level & DL_EXCLUDE_USER_DATA)
>> -		    && isAnon(mapping)) {
>> +		    && (isAnon(mapping) || isHugePage(flags))) {
>>   			if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
>>   				pfn_user++;
>
> This code will filter out both transparent huge pages and hugetlbfs pages.
> But I think hugetlbfs pages shouldn't be filtered out as anonymous pages
> because anonymous pages and hugetlbfs pages are certainly different kind
> of pages.
>
> Therefore, I plan to add a new dump level to filter out hugetlbfs pages.
> My basic idea is like below:
>
>
>   		 * Exclude the data page of the user process.
>   		 */
>                  else if ((info->dump_level & DL_EXCLUDE_USER_DATA)
>                      && isAnon(mapping)) {
>                          if (clear_bit_on_2nd_bitmap_for_kernel(pfn))
>                                  pfn_user++;
> +
> +                       if (isPageHead(flags)) {
> +                               int i, nr_pages = 1 << compound_order;
> +
> +                               for (i = 1; i < nr_pages; ++i) {
> +                                       if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
> +                                               pfn_user++;
> +                               }
> +                               pfn += nr_pages - 2;
> +                               mem_map += (nr_pages - 1) * SIZE(page);
> +                       }

This seems correct. We don't know when and where page table gets remapped by huge pages
transparently. If user-space and transparent huge pages are filtered separately, data
larger than size of huge table, 2MiB or 1GiB on x86, can be corrupted. Consider the case
that 2MiB + 4KiB data is allocated on 2MiB boundary and only the first 2MiB area is
remapped by THP. We should filter them together.

> +               }
> +               /*
> +                * Exclude the hugetlbfs pages.
> +                */
> +               else if ((info->dump_level & DL_EXCLUDE_HUGETLBFS_DATA)
> +                        && (!isAnon(mapping) && isPageHead(flags))) {
> +                       int i, nr_pages = 1 << compound_order;
> +
> +                       for (i = 0; i < nr_pages; ++i) {
> +                               if (clear_bit_on_2nd_bitmap_for_kernel(pfn + i))
> +                                       pfn_hugetlb++;
> +                       }
> +                       pfn += nr_pages - 1;
> +                       mem_map += (nr_pages - 1) * SIZE(page);
>                  }
>                  /*
>                   * Exclude the hwpoison page.
>

hugetlbfs has no such risk, so this seems basically OK to me.

Also, hugetlbfs pages are "user data". They should get filtered when user data dump level
is specified. So how about:

       Dump  |  zero   without  with     user    free   hugetlbfs
       Level |  page   private  private  data    page   page
      -------+-----------------------------------------------------
          0  |
          1  |   X
          2  |           X
          4  |           X        X
          8  |                            X              X
         16  |                                    X
         32  |                                           X
         63  |   X       X        X       X       X      X

On dump level 8, hugetlbfs pages also get filtered.

>
> But before that, I would like to hear the opinions about the new dump level
> to make sure that the level is worthy or not.
>



-- 
Thanks.
HATAYAMA, Daisuke




More information about the kexec mailing list