/proc/vmcore mmap() failure issue

HATAYAMA Daisuke d.hatayama at jp.fujitsu.com
Thu Nov 14 05:31:37 EST 2013


(2013/11/14 5:41), Vivek Goyal wrote:
> Hi Hatayama,
>
> We are facing some /proc/vmcore mmap() failure issues and then makdumpfile
> exits without saving dump and system reboots.
>
> I tried latest makedumpfile (devel branch) with 3.12 kernel.
>
> I think this issue happens only on some machines. And it looks like it
> happens when end of system RAM chunk in first kernel is not page aligned. For
> example, I have one machine where I noticed it and this is how system
> RAM looks like.
>
> 00100000-dafa57ff : System RAM
>    01000000-015892fa : Kernel code
>    015892fb-0195c9ff : Kernel data
>    01ae6000-01d31fff : Kernel bss
>    24000000-33ffffff : Crash kernel
> dafa5800-dbffffff : reserved
>
> Notice that dafa57ff does not end at page boundary and next reserved
> range does not start at page boundary. I think that next reserved
> range is referenced through some ACPI data. More on this later.
>
> So we put some printk() messages to get more info. In a nut shell,
> remap_pfn_range() fails when we try to map the last section of system
> RAM not ending on page boundary.
>
> remap_pfn_range()
>     track_pfn_remap() {
>          /*
>           * For anything smaller than the vma size we set prot based on the
>           * lookup.
>           */
>          flags = lookup_memtype(paddr);
>
>          /* Check memtype for the remaining pages */
>          while (size > PAGE_SIZE) {
>                  size -= PAGE_SIZE;
>                  paddr += PAGE_SIZE;
>                  if (flags != lookup_memtype(paddr))
>                          return -EINVAL; <---------------- Failure.
>          }
> 	
>     }
>
>
> So we pass in a range to track_pfn_remap. Say pfn=0xdad62 size=0x244000.
> Now we call lookup_memtype() on every page in the range and make sure
> they all are same, otherwise we fail. Guess what, all all same except
> last page (which does not end at page boundary).
>
> I dived deeper in to lookup_memtype() and noticed that all regular
> ranges are not registered anywhere and their flags are _PAGE_CACHE_UC_MINUS.
> But last unaligned page/range, is registered in memtype rb tree and
> has attribute, _PAGE_CACHE_WB.
>
> Then I hooked into reserve_memtype() to figure out who is registering
> page 0xdafa5000 and it is acpi_init() which does it.
>
> [    0.721655] Hardware name: <edited>
> [    0.730590]  ffff8800340f3830 ffff8800340f37c0 ffffffff81575509
> 00000000dafa5000
> [    0.738010]  ffff8800340f3800 ffffffff810566cc 00000000000dafa5
> 00000000dafa5000
> [    0.745428]  00000000dafa6000 00000000dafa5000 0000000000000000
> 0000000000001000
> [    0.752845] Call Trace:
> [    0.755288]  [<ffffffff81575509>] dump_stack+0x45/0x56
> [    0.760414]  [<ffffffff810566cc>] reserve_memtype+0x31c/0x3f0
> [    0.766144]  [<ffffffff810537ef>] __ioremap_caller+0x12f/0x360
> [    0.771963]  [<ffffffff8130ad56>] ? acpi_os_release_object+0xe/0x12
> [    0.778217]  [<ffffffff815686ba>] ? acpi_os_map_memory+0xf6/0x14e
> [    0.784295]  [<ffffffff81053a54>] ioremap_cache+0x14/0x20
> [    0.789679]  [<ffffffff815686ba>] acpi_os_map_memory+0xf6/0x14e
> [    0.795582]  [<ffffffff81322ac9>]
> acpi_ex_system_memory_space_handler+0xdd/0x1ca
> [    0.802961]  [<ffffffff8131ca48>]
> acpi_ev_address_space_dispatch+0x1b0/0x208
> [    0.809993]  [<ffffffff8131fd49>] acpi_ex_access_region+0x20e/0x2a2
> [    0.816244]  [<ffffffff81149464>] ? __alloc_pages_nodemask+0x134/0x300
> [    0.822754]  [<ffffffff813200e4>] acpi_ex_field_datum_io+0xf6/0x171
> [    0.829004]  [<ffffffff81320301>] acpi_ex_extract_from_field+0xd7/0x20a
> [    0.835602]  [<ffffffff81331d80>] ?
> acpi_ut_create_internal_object_dbg+0x23/0x8a
> [    0.842981]  [<ffffffff8131f8e7>]
> acpi_ex_read_data_from_field+0x10f/0x14b
> [    0.849838]  [<ffffffff81322e16>]
> acpi_ex_resolve_node_to_value+0x18e/0x21c
> [    0.856780]  [<ffffffff813230a6>] acpi_ex_resolve_to_value+0x202/0x209
> [    0.863291]  [<ffffffff81319486>] acpi_ds_evaluate_name_path+0x7b/0xf5
> [    0.869803]  [<ffffffff81319834>] acpi_ds_exec_end_op+0x98/0x3e8
> [    0.875793]  [<ffffffff8132aca4>] acpi_ps_parse_loop+0x514/0x560
> [    0.881784]  [<ffffffff8132b738>] acpi_ps_parse_aml+0x98/0x28c
> [    0.887601]  [<ffffffff8132bf8d>] acpi_ps_execute_method+0x1c1/0x26c
> [    0.893939]  [<ffffffff813269c5>] acpi_ns_evaluate+0x1c1/0x258
> [    0.899755]  [<ffffffff8131cb98>] acpi_ev_execute_reg_method+0xca/0x112
> [    0.906353]  [<ffffffff8131cd6e>] acpi_ev_reg_run+0x48/0x52
> [    0.911910]  [<ffffffff81328fad>] acpi_ns_walk_namespace+0xc8/0x17f
> [    0.918160]  [<ffffffff8131cd26>] ? acpi_ev_detach_region+0x146/0x146
> [    0.924585]  [<ffffffff8131cdbc>] acpi_ev_execute_reg_methods+0x44/0xf7
> [    0.931184]  [<ffffffff819b2324>] ? acpi_sleep_proc_init+0x2a/0x2a
> [    0.937349]  [<ffffffff8130ac66>] ? acpi_os_wait_semaphore+0x43/0x57
> [    0.943686]  [<ffffffff81331a3f>] ? acpi_ut_acquire_mutex+0x48/0x88
> [    0.949938]  [<ffffffff8131ceb8>]
> acpi_ev_initialize_op_regions+0x49/0x71
> [    0.956709]  [<ffffffff819b2324>] ? acpi_sleep_proc_init+0x2a/0x2a
> [    0.962873]  [<ffffffff81333310>] acpi_initialize_objects+0x23/0x4f
> [    0.969125]  [<ffffffff819b23b4>] acpi_init+0x90/0x268
>
> So basically, this split page seems to be a problem. Some other code
> thinks that it has access to full page and goes ahead and registers
> that with PAT rb tree and this causes problems in mmap() code.
>
> I suspect we might have to go back to idea of copying first and last
> non page aligned ranges in new kernel's memory and read it from there
> to solve this issue. Do you have other ideas?
>

Sorry for delayed response, although it looks like you have already found
a way to fix this issue.

BTW, I previously found a part of makedumpfile that truncates the first and
last pages if they are not aligned in page size. Discussing with Kumagai-san,
the truncation is performed on some ia64 system and he found a valid data in
the truncated area, and the latest makedumpfile no longer does such
truncation.

The commit is:

commit f854b37adba223d5b4801accbedd17b447266d51
Author: Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp>
Date:   Fri Jun 21 15:25:31 2013 +0900

     [PATCH 2/2] Fix the handling of the pages correspond to border of PT_LOAD.

     The pages correspond to border of PT_LOAD were removed as holes.
     For example, pfn:N showed below was removed but we know even
     odd region like [0x40ffda7000 - 0x40ffda8000] can include valid
     dates, so we shouldn't remove it as holes.

                                phys_start
                                = 0x40ffda7000
              |<-- frac_head -->|------------- PT_LOAD -------------
          ----+-----------------------+---------------------+----
              |         pfn:N         |       pfn:N+1       | ...
          ----+-----------------------+---------------------+----
              |
          pfn_to_paddr(pfn:N)               # page size = 16k
          = 0x40ffda4000

     This patch handles such odd regions correctly. Then read pfn:N
     and write it to disk, the ranges not covered by any PT_LOAD
     entries will be filled with 0.

     Signed-off-by: Atsushi Kumagai <kumagai-atsushi at mxc.nes.nec.co.jp>

The log on the web is:

http://lists.infradead.org/pipermail/kexec/2013-May/008875.html

So, without this change, you would not have seen this issue. The original
reason why the code was implemented so might be the issues similar to here.

Next, I think it necessary to consider whether or not to revert the above
commit or not since makedumpfile fails on some kind of system as you reported.

-- 
Thanks.
HATAYAMA, Daisuke




More information about the kexec mailing list