[PATCH v3 5/6] mm: Add mirror flag back on initrd memory

Ard Biesheuvel ardb at kernel.org
Fri Jun 10 04:06:35 PDT 2022


On Thu, 9 Jun 2022 at 10:16, mawupeng <mawupeng1 at huawei.com> wrote:
>
>
>
> 在 2022/6/8 18:12, Ard Biesheuvel 写道:
> > On Wed, 8 Jun 2022 at 12:08, David Hildenbrand <david at redhat.com> wrote:
> >>
> >> On 08.06.22 12:02, Mike Rapoport wrote:
> >>> On Wed, Jun 08, 2022 at 03:27:09PM +0800, mawupeng wrote:
> >>>>
> >>>> 在 2022/6/7 22:49, Ard Biesheuvel 写道:
> >>>>> On Tue, 7 Jun 2022 at 14:22, David Hildenbrand <david at redhat.com> wrote:
> >>>>>>
> >>>>>> On 07.06.22 11:38, Wupeng Ma wrote:
> >>>>>>> From: Ma Wupeng <mawupeng1 at huawei.com>
> >>>>>>>
> >>>>>>> Initrd memory will be removed and then added in arm64_memblock_init() and this
> >>>>>>> will cause it to lose all of its memblock flags. The lost of MEMBLOCK_MIRROR
> >>>>>>> flag will lead to error log printed by find_zone_movable_pfns_for_nodes if
> >>>>>>> the lower 4G range has some non-mirrored memory.
> >>>>>>>
> >>>>>>> In order to solve this problem, the lost MEMBLOCK_MIRROR flag will be
> >>>>>>> reinstalled if the origin memblock has this flag.
> >>>>>>>
> >>>>>>> Signed-off-by: Ma Wupeng <mawupeng1 at huawei.com>
> >>>>>>> ---
> >>>>>>>    arch/arm64/mm/init.c     |  9 +++++++++
> >>>>>>>    include/linux/memblock.h |  1 +
> >>>>>>>    mm/memblock.c            | 20 ++++++++++++++++++++
> >>>>>>>    3 files changed, 30 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> >>>>>>> index 339ee84e5a61..11641f924d08 100644
> >>>>>>> --- a/arch/arm64/mm/init.c
> >>>>>>> +++ b/arch/arm64/mm/init.c
> >>>>>>> @@ -350,9 +350,18 @@ void __init arm64_memblock_init(void)
> >>>>>>>                         "initrd not fully accessible via the linear mapping -- please check your bootloader ...\n")) {
> >>>>>>>                         phys_initrd_size = 0;
> >>>>>>>                 } else {
> >>>>>>> +                     int flags, ret;
> >>>>>>> +
> >>>>>>> +                     ret = memblock_get_flags(base, &flags);
> >>>>>>> +                     if (ret)
> >>>>>>> +                             flags = 0;
> >>>>>>> +
> >>>>>>>                         memblock_remove(base, size); /* clear MEMBLOCK_ flags */
> >>>>>>>                         memblock_add(base, size);
> >>>>>>>                         memblock_reserve(base, size);
> >>>>>>
> >>>>>> Can you explain why we're removing+re-adding here exactly? Is it just to
> >>>>>> clear flags as the comment indicates?
> >>>>>>
> >>>>>
> >>>>> This should only happen if the placement of the initrd conflicts with
> >>>>> a mem= command line parameter or it is not covered by memblock for
> >>>>> some other reason.
> >>>>>
> >>>>> IOW, this should never happen, and if re-memblock_add'ing this memory
> >>>>> unconditionally is causing problems, we should fix that instead of
> >>>>> working around it.
> >>>>
> >>>> This will happen if we use initrdmem=3G,100M to reserve initrd memory below
> >>>> the 4G limit to test this scenario(just for testing, I have trouble to boot
> >>>> qemu with initrd enabled and memory below 4G are all mirror memory).
> >>>>
> >>>> Re-memblock_add'ing this memory unconditionally seems fine but clear all
> >>>> flags(especially MEMBLOCK_MIRROR) may lead to some error log.
> >>>>
> >>>>>
> >>>>>> If it's really just about clearing flags, I wonder if we rather want to
> >>>>>> have an interface that does exactly that, and hides the way this is
> >>>>>> actually implemented (obtain flags, remove, re-add ...), internally.
> >>>>>>
> >>>>>> But most probably there is more magic in the code and clearing flags
> >>>>>> isn't all it ends up doing.
> >>>>>>
> >>>>>
> >>>>> I don't remember exactly why we needed to clear the flags, but I think
> >>>>> it had to do with some corner case we hit when the initrd was
> >>>>> partially covered.
> >>>> If "mem=" is set in command line, memblock_mem_limit_remove_map() will
> >>>> remove all memory block without MEMBLOCK_NOMAP. Maybe this will bring the
> >>>> memory back if this initrd mem has the MEMBLOCK_NOMAP flag?
> >>>>
> >>>> The rfc version [1] introduce and use memblock_clear_nomap() to clear the
> >>>> MEMBLOCK_NOMAP of this initrd memblock.
> >>>> So maybe the usage of memblock_remove() is just to avoid introducing new
> >>>> function(memblock_clear_nomap)?
> >>>>
> >>>> Since commit 4c546b8a3469 ("memblock: add memblock_clear_nomap()") already
> >>>> introduced memblock_clear_nomap(). Can we use this to remove flag MEMBLOCK_NOMAP
> >>>> to solve this problem rather than bring flag MEMBLOCK_MIRROR back?
> >>>
> >>> AFAICT, there are two corner cases that re-adding initrd memory covers:
> >>> * initrd memory is not a part of the memory reported to memblock, either
> >>> because of firmware weirdness or because it was cut out with mem=
> >>> * initrd memory overlaps a NOMAP region
> >>>
> >>> So to make sure initrd memory is mapped properly and retains
> >>> MEMBLOCK_MIRROR I think the best we can do is
> >>>
> >>>        memblock_add();
> >>>        memblock_clear_nomap();
> >>>        memblock_reserve();
> >>
> >> Would simply detect+rejecting to boot on such setups be an option? The
> >> replies so far indicate to me that this is rather a corner case than a
> >> reasonable use case.
> >>
> >
> > The sad reality is that mem= is known to be used in production for
> > limiting the amount of memory that the kernel takes control of, in
> > order to allow the remainder to be used in platform specific ways.
> >
> > Of course, there are much better ways to achieve that, but given that
> > we currently support it, I don't think we can easily back that out.
> >
> > I do think that there is no need to go out of our way to make this
> > case work seamlessly with mirrored memory, though. So I'd prefer to
> > make the remove+re-add conditional on there actually being a need to
> > do so. That way, we don't break the old use case or mirrored memory,
> > and whatever happens when the two are combined is DONTCARE.
>
> Does that mean that we don't need to care about this scenario with
> mirror memory?
>

We shouldn't, but we do, unfortunately.

So we should probably adopt the sequence suggested by Mike.



More information about the linux-riscv mailing list