RE: [此邮件可能存在风险] Re: Re: [PATCH] arm64: update PHYS_OFFSET to conform to kernel

Jin, Yanjiang yanjiang.jin at hxt-semitech.com
Sun Jun 3 19:12:53 PDT 2018


Hi Bhupesh,

Thanks for your reply. I have also subscribed linux-arm-kernel, let's wait the update.

Regards!
Yanjiang

> -----Original Message-----
> From: Bhupesh Sharma [mailto:bhsharma at redhat.com]
> Sent: 2018年6月2日 5:50
> To: Jin, Yanjiang <yanjiang.jin at hxt-semitech.com>
> Cc: Pratyush Anand <pratyush.anand at gmail.com>; kexec at lists.infradead.org;
> jinyanjiang at gmail.com; horms at verge.net.au; Zheng, Joey <yu.zheng at hxt-
> semitech.com>
> Subject: [此邮件可能存在风险] Re: Re: [PATCH] arm64: update PHYS_OFFSET to
> conform to kernel
>
> Hi Yanjiang,
>
> Thanks, the description of the issue is more clear now.
>
> Also I managed to fix my qualcomm board to reproduce this issue.
> Please see more comments inline:
>
> On Thu, May 31, 2018 at 11:01 AM, Jin, Yanjiang <yanjiang.jin at hxt-semitech.com>
> wrote:
> > Hi Bhupesh,
> >
> > 1.  To be clearer, I listed my memory layout again here:
> >
> > In the first kernel, execute the below command to get the last virtual memory:
> >
> > #dmesg | grep memory
> > ..........
> > memory  : 0xffff800000200000 - 0xffff801800000000
> >
> > The use readelf to get the last Program Header from vmcore:
> >
> > # readelf -l vmcore
> >
> > ELF Header:
> > ........................
> >
> > Program Headers:
> >   Type           Offset             VirtAddr           PhysAddr                 FileSiz            MemSiz
> Flags  Align
> > .........................................................................................................................
> .....................................
> >   LOAD        0x0000000076d40000 0xffff80017fe00000 0x0000000180000000
> 0x0000001680000000 0x0000001680000000  RWE    0
> >
> > Do a simple calculation:
> >
> > (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 =
> 0xFFFF8017FFE00000 != 0xffff801800000000.
> >
> > The end virtual memory node are mismatch between vmlinux and vmcore. If
> you do the same 3 steps, I think you will get the same results as mine.
> >
> >
> > 2. But why you can’t reproduce my issue? The reason is my address of symbol
> “log_buf” is located in the last 2M.
> > I guess it isn’t in the last 2M bytes on your environment, so we get the different
> vmcore-dmesg results.
> > You can simply check the log_buf’s address through crash as below:
> >
> > crash> print log_buf
> > $1 = 0xffff8017ffe90000 ""
> >
> > In vmcore-dmesg.c, the function dump_dmesg_structured() wants to get
> log_buf offset through the below codes:
> >
> > log_buf_offset = read_file_pointer(fd,
> > vaddr_to_offset(log_buf_vaddr)); log_buf_offset =
> > vaddr_to_offset(log_buf);
> >
> > Error happens in vaddr_to_offset(), it reports the below error on my board:
> > “No program header covering vaddr 0xffff8017ffe90000 found kexec bug?”
> >
> > If I adjust my memory’s layout, don’t put log_buf into the last 2M, vmcore-
> dmesg will succeed. But this issue still exists, vmlinux and vmcore’s layouts are
> mismatch.
> >
> > log_buf in the last 2M is not common, but it does happen on my board.
> >
> >
> > 3. Now let's go back to the code itself. No matter we can reproduce this bug or
> not, phys_offset’s code’s issue always exists.
> >
> > In kernel:
> > arm64_memblock_init() calls round_down to recalculate memstart_addr:
> >
> > memstart_addr = round_down(memblock_start_of_DRAM(),
> > ARM64_MEMSTART_ALIGN);
> >
> > memblock_start_of_DRAM() is 0x200000, it is the first memblock’s base.
> > ARM64_MEMSTART_ALIGN is 0x40000000 on my board.
> >
> > So memstart_addr is 0, and phys_offset = memstart_addr = 0;
> >
> > But in kexec-tools:
> > phys_offset is set in the function get_memory_ranges_iomem_cb() :
> >
> > get_memory_ranges_iomem_cb()->set_phys_offset().
> >
> > This function is just get the first memblock’s base(first block of “/proc/iomem”),
> no round_down() operation.
> >
> > To align with kernel, kexec-tools should call the similar round_down() function
> for this base. But obviously, kexec-tools doesn’t do this step.
> > It’s hard to get kernel’s round_down parameters in kexec-tools, but read
> memstart_addr’s value from DEVMEM is safe, we can always get the correct
> value regardless of whether KASLR is enabled.
>
> The problem statement is more clearer now (thanks for detailing the
> environment in your last email).
>
> I think I understand the issue with 'memstart_addr' being 0 and it is part of a few
> KASLR (although a few of them are valid for non-KASLR case as well) related
> queries that I have recently asked arm64 kernel maintainers upstream (please
> see [1] for details).
>
> There I have asked the maintainers about their views regarding whether we
> should update the user-space tools which use the memblocks listed in
> '/proc/iomem' to obtain the value of PHY_OFFSET (by reading the base of the 1st
> memblock) and read the value of 'memstart_addr'
> somehow in user-space to get the PHY_OFFSET, or should the change be done at
> the kernel end to calculate 'memstart_addr' as:
>
>         /*
>          * Select a suitable value for the base of physical memory.
>          */
>         memstart_addr = round_down(memblock_start_of_DRAM(),
>                                    ARM64_MEMSTART_ALIGN);
>         if (memstart_addr)
>                 memstart_addr = memblock_start_of_DRAM();
>
> Let's wait for an update from the ARM64 kernel maintainers, because I think this
> change might be needed in other user-space tools (if we decide to make the
> change in the user-space side) e.g. makedumpfile in addition to kexec-tools to
> correctly handle this unique use-case where we have value of
> memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN
>
> [1] https://www.spinics.net/lists/arm-kernel/msg655933.html
>
> Thanks,
> Bhupesh
>
>
> >
> >> -----Original Message-----
> >> From: Bhupesh Sharma [mailto:bhsharma at redhat.com]
> >> Sent: 2018年5月30日 23:56
> >> To: Jin, Yanjiang <yanjiang.jin at hxt-semitech.com>; Pratyush Anand
> >> <pratyush.anand at gmail.com>
> >> Cc: kexec at lists.infradead.org; jinyanjiang at gmail.com;
> >> horms at verge.net.au; Zheng, Joey <yu.zheng at hxt-semitech.com>
> >> Subject: [此邮件可能存在风险] Re: [PATCH] arm64: update PHYS_OFFSET to
> conform
> >> to kernel
> >>
> >> On 05/30/2018 03:50 PM, Jin, Yanjiang wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Bhupesh Sharma [mailto:bhsharma at redhat.com]
> >> >> Sent: 2018年5月30日 16:39
> >> >> To: Jin, Yanjiang <yanjiang.jin at hxt-semitech.com>; Pratyush Anand
> >> >> <pratyush.anand at gmail.com>
> >> >> Cc: kexec at lists.infradead.org; jinyanjiang at gmail.com;
> >> >> horms at verge.net.au; Zheng, Joey <yu.zheng at hxt-semitech.com>
> >> >> Subject: Re: [PATCH] arm64: update PHYS_OFFSET to conform to
> >> >> kernel
> >> >>
> >> >> Hi Yanjiang,
> >> >>
> >> >> On 05/30/2018 01:09 PM, Jin, Yanjiang wrote:
> >> >>>
> >> >>>
> >> >>>> -----Original Message-----
> >> >>>> From: Pratyush Anand [mailto:pratyush.anand at gmail.com]
> >> >>>> Sent: 2018年5月30日 12:16
> >> >>>> To: Jin, Yanjiang <yanjiang.jin at hxt-semitech.com>
> >> >>>> Cc: kexec at lists.infradead.org; jinyanjiang at gmail.com;
> >> >>>> horms at verge.net.au
> >> >>>> Subject: Re: [PATCH] arm64: update PHYS_OFFSET to conform to
> >> >>>> kernel
> >> >>>>
> >> >>>> Hi Yanjiang,
> >> >>>>
> >> >>>> On Wed, May 30, 2018 at 8:33 AM, Jin, Yanjiang
> >> >>>> <yanjiang.jin at hxt-semitech.com>
> >> >>>> wrote:
> >> >>>>> Hi Pratyush,
> >> >>>>>
> >> >>>>> Thanks for your help! but please see my reply inline.
> >> >>>>>
> >> >>>>
> >> >>>> [...]
> >> >>>>
> >> >>>>>>> If an application, for example, vmcore-dmesg, wants to access
> >> >>>>>>> the kernel symbol which is located in the last 2M address, it
> >> >>>>>>> would fail with the below error:
> >> >>>>>>>
> >> >>>>>>>     "No program header covering vaddr 0xffff8017ffe90000
> >> >>>>>>> found kexec
> >> >> bug?"
> >> >>>>>>
> >> >>>>>> I think, fix might not be correct.
> >> >>>>>>
> >> >>>>>> Problem is in vmcore-dmesg and that should be fixed and not the
> kexec.
> >> >>>>>> See here
> >> >>>>>> (https://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-
> >> >>>>>> tools.git/tree/vmcore-dmesg/vmcore-dmesg.c?id=HEAD#n261).
> >> >>>>>
> >> >>>>> Firstly, for my patch, vmcore-dmesg is just an auxiliary
> >> >>>>> application to help to
> >> >>>> reproduce this issue. The function, which is to generate vmcore,
> >> >>>> is the root
> >> >> cause.
> >> >>>>
> >> >>>> ...and the function which generates vmcore is not the kexec
> >> >>>> rather the secondary kernel.
> >> >>>>
> >> >>>>>
> >> >>>>> On the other hand, vmcore-dmesg is under kexec-tools, it has no
> >> >>>>> a standalone
> >> >>>> git repo.  Even we want to fix vmcore-dmesg, we still need to
> >> >>>> send the patch to kexec-tools, right?
> >> >>>>
> >> >>>> Sure. I meant `kexec` application. We have three applications in kexec-
> tools.
> >> >>>> `kexec`, `vmcore-dmesg` and `kdump`. [I hope kdump is useless
> >> >>>> and we are going to get rid off it very soon.]
> >> >>>>
> >> >>>>>
> >> >>>>> Yanjiang
> >> >>>>>
> >> >>>>>> How symbols are extracted from vmcore.
> >> >>>>>>
> >> >>>>>> You do have "NUMBER(PHYS_OFFSET)=" information in vmcore.
> >> >>>>>>
> >> >>>>>> You can probably see makedumpfile code, that how to extract
> >> >>>>>> information from "NUMBER".
> >> >>>>>
> >> >>>>> I have seen makedumpfile before, NUMBER(number) is just read a
> >> >>>>> number
> >> >>>> from vmcore. But as I show before, the root issue is vmcore
> >> >>>> contains a wrong number, my patch is to fix the vmcore
> >> >>>> generating issue, we can't read vmcore at this point since we don't have
> vmcore yet.
> >> >>>>
> >> >>>> ..and IIUC, you were able to reach correctly till the end of
> >> >>>> secondary kernel where you tried vmcore-dmesg and then you had
> >> >>>> issue,
> >> >> right?
> >> >>>>
> >> >>>> How did you conclude that vmcore contains wrong number? It's
> >> >>>> unlikely, but if it does then we have problem somewhere in Linux
> >> >>>> kernel , not
> >> >> here.
> >> >>>
> >> >>> Hi Pratyush,
> >> >>>
> >> >>> I think I have found the root cause. In Linux kernel,
> >> >>> memblock_mark_nomap()
> >> >> will reserve some memory ranges for EFI, such as
> >> >> EFI_RUNTIME_SERVICES_DATA, EFI_BOOT_SERVICES_DATA. On my
> >> environment,
> >> >> the first 2M memory is EFI_RUNTIME_SERVICES_DATA, so it can't be
> >> >> seen in kernel. We also can't set this EFI memory as "reserved",
> >> >> only EFI_ACPI_RECLAIM_MEMORY's memory can be set as "reserved" and
> >> >> seen in
> >> kernel.
> >> >>> So I don't think this is a kernel issue, we should fix it in kexec-tools.
> >> >>> Attach kernel's call stack for reference.
> >> >>>
> >> >>> drivers/firmware/efi/arm-init.c
> >> >>>
> >> >>> efi_init()->reserve_regions()->memblock_mark_nomap()
> >> >>>
> >> >>> Hi Bhupesh,
> >> >>>
> >> >>> I guess your environment has no EFI support, or the first
> >> >>> memblock is not
> >> >> reserved for EFI, so you can't reproduce this issue.
> >> >>
> >> >> Perhaps you missed reading my earlier threads on the subject of
> >> >> EFI_ACPI_RECLAIM_MEMORY regions being mapped as NOMAP and how
> it
> >> >> causes the crashkernel to panic (please go through [1]).
> >> >>
> >> >> As of now we haven't found a acceptable-to-all solution for the
> >> >> issue and it needs to be fixed in the 'kexec-tools' with a minor
> >> >> fix in the kernel side
> >> as well.
> >> >>
> >> >> So, coming back to my environment details, it has both EFI support
> >> >> as well as EFI ACPI RECLAIM regions.
> >> >>
> >> >> However we may be hitting a special case in your environment, so I
> >> >> think before we can discuss your patch further (as both Pratyush
> >> >> and myself have concerns with the same), would request you to
> >> >> share the
> >> >> following:
> >> >>
> >> >> - output of kernel dmesg with 'efi=debug' added in the bootargs
> >> >> (which will help us see how the memblocks are marked at your setup
> >> >> - I am specifically interested in the logs after the line
> >> >> 'Processing EFI memory map'),
> >> >
> >> > I made more investigation on my board.   I believe that the firmware design
> >> leads this differences between our environments:
> >> >
> >> > My firmware defines the first two EFI block as below:
> >> >
> >> > Region1: 0x000000000000-0x000000200000 [EfiReservedMemType]
> >> > Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData]
> >> >
> >> > But EFI API won't return the "EfiReservedMemType" memory to Linux
> >> > kernel
> >> for security reasons, so kernel can't get any info about the first
> >> mem block, kernel can only see region2 as below:
> >> >
> >> > efi: Processing EFI memory map:
> >> > efi:   0x000000200000-0x00000021ffff [Runtime Data       |RUN|  |  |  |  |  |  |
> >> |WB|WT|WC|UC]
> >> >
> >> > # head -1 /proc/iomem
> >> > 00200000-0021ffff : reserved
> >>
> >> I have the same case on boards at my end:
> >>
> >> # head -1 /proc/iomem
> >> 00200000-0021ffff : reserved
> >>
> >> # dmesg | grep -i "Processing EFI memory map" -A 5
> >> [    0.000000] efi: Processing EFI memory map:
> >> [    0.000000] efi:   0x000000200000-0x00000021ffff [Runtime Data
> >> |RUN|  |  |  |  |  |  |   |WB|WT|WC|UC]
> >> [    0.000000] efi:   0x000000400000-0x0000005fffff [ACPI Memory NVS
> >> |   |  |  |  |  |  |  |   |  |  |  |UC]
> >> [    0.000000] efi:   0x000000800000-0x00000081ffff [ACPI Memory NVS
> >> |   |  |  |  |  |  |  |   |  |  |  |UC]
> >> [    0.000000] efi:   0x000000820000-0x000001600fff [Conventional
> >> Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> >> [    0.000000] efi:   0x000001601000-0x0000027fffff [Loader Data
> >> |   |  |  |  |  |  |  |   |WB|WT|WC|UC]
> >>
> >> So, no your environment is not a special one (as I also use ATF as
> >> the
> >> EL3 boot firmware), see more below ..
> >>
> >> > There are many EfiReservedMemType regions in ARM64's firmware if it
> >> supports TrustZone, but if a firmware doesn't put this type of memory
> >> region at the start of physical memory, this error wouldn't happen. I
> >> don't think firmware has error since it can reserve any memory
> >> regions, we'd better update kexec- tools.
> >> > Anyway, read memstart_addr from /dev/mem can always get  a correct
> >> > value if
> >> DEVMEM is defined.
> >>
> >> .. At my side with the latest upstream kernel (with commit
> >> f56ab9a5b73ca2aee777ccdf2d355ae2dd31db5a reverted to allow
> >> crashkernel to boot while accessing ACPI tables) and latest upstream
> >> kexec-tools, I can boot the crashkernel properly, collect the vmcore
> >> properly and analyze the crash dump via tools like gdb and crash also.
> >>
> >> So, I would try to also use the vmcore-dmesg tool and see if I get
> >> any issues with the same. Till then you can try and see if there are
> >> any other obvious differences in your environment which might be causing
> this issue at your end.
> >>
> >> Thanks,
> >> Bhupesh
> >>
> >>
> >> >> - if you are using a public arm64 platform maybe you can share the
> >> >> CONFIG file,
> >> >> - output of 'cat /proc/iomem'
> >> >>
> >> >> [1] https://www.spinics.net/lists/arm-kernel/msg616632.html
> >> >>
> >> >> Thanks,
> >> >> Bhupesh
> >> >>
> >> >>>> Have you tried to extract "PHYS_OFFSET" from vmcore either in
> >> >>>> vmcore-dmesg or in makedumpfile and found it not matching to the
> >> >>>> value of
> >> >> "PHYS_OFFSET"
> >> >>>> from first kernel?
> >> >>>>
> >> >>>> In my understanding flow is like this:
> >> >>>>
> >> >>>> - First kernel will have reserved area for secondary kernel, as
> >> >>>> well as for
> >> >> elfcore.
> >> >>>> - First kernel will embed all the vmcore information notes into
> >> >>>> elfcore (see
> >> >>>> crash_save_vmcoreinfo_init() -> arch_crash_save_vmcoreinfo()).
> >> >>>> Therefore, we will have PHYS_OFFSET, kimage_voffset and VA_BITS
> >> >>>> information for first kernel in vmcore, which is in separate
> >> >>>> memory and can be read by second kernel
> >> >>>> - elfcore will also have notes about all the other physical
> >> >>>> memory of first kernel which need to be copied by second kernel.
> >> >>>> - Now when crash happens, second kernel should have all the
> >> >>>> required info for reading symbols from first kernel's physical memory,
> no?
> >> >>>>
> >> >>>>>
> >> >>>>> NUMBER(number) =
> read_vmcoreinfo_ulong(STR_NUMBER(str_number))
> >> >>>>>
> >> >>>>> Yanjiang
> >> >>>>>
> >> >>>>>>
> >> >>>>>> Once you know the real PHYS_OFFSET (which could have been
> >> >>>>>> random if KASLR is enabled), you can fix the problem you are seeing.
> >> >>>>>
> >> >>>>> I have both validated with/without KASLR,  all of them worked
> >> >>>>> well after
> >> >>>> applying my patch.
> >> >>>>
> >> >>>> IMHO, even if that works it does not mean that its good a fix.
> >> >>>> We should try to find root cause. Moreover, you might not have
> >> >>>> /dev/mem available for all the configuration where KASLR is enabled.
> >> >>>>
> >> >>>> Regards
> >> >>>> Pratyush
> >> >>>
> >> >>>
> >> >>>
> >> >>> This email is intended only for the named addressee. It may
> >> >>> contain
> >> >> information that is confidential/private, legally privileged, or
> >> >> copyright-protected, and you should handle it accordingly. If you
> >> >> are not the intended recipient, you do not have legal rights to
> >> >> retain, copy, or distribute this email or its contents, and should
> >> >> promptly delete the email and all electronic copies in your
> >> >> system; do not retain copies in any media. If you have received
> >> >> this email in error, please
> >> notify the sender promptly. Thank you.
> >> >>>
> >> >>>
> >> >
> >> >
> >> >
> >> >
> >> > This email is intended only for the named addressee. It may contain
> >> information that is confidential/private, legally privileged, or
> >> copyright-protected, and you should handle it accordingly. If you are
> >> not the intended recipient, you do not have legal rights to retain,
> >> copy, or distribute this email or its contents, and should promptly
> >> delete the email and all electronic copies in your system; do not
> >> retain copies in any media. If you have received this email in error, please
> notify the sender promptly. Thank you.
> >> >
> >> >
> >
> >
> >
> >
> > This email is intended only for the named addressee. It may contain
> information that is confidential/private, legally privileged, or copyright-protected,
> and you should handle it accordingly. If you are not the intended recipient, you
> do not have legal rights to retain, copy, or distribute this email or its contents, and
> should promptly delete the email and all electronic copies in your system; do not
> retain copies in any media. If you have received this email in error, please notify
> the sender promptly. Thank you.
> >
> >



This email is intended only for the named addressee. It may contain information that is confidential/private, legally privileged, or copyright-protected, and you should handle it accordingly. If you are not the intended recipient, you do not have legal rights to retain, copy, or distribute this email or its contents, and should promptly delete the email and all electronic copies in your system; do not retain copies in any media. If you have received this email in error, please notify the sender promptly. Thank you.




More information about the kexec mailing list