About SECTION_SIZE_BITS for Sparsemem

Minchan Kim minchan.kim at gmail.com
Mon Jul 12 22:05:26 EDT 2010


On Tue, Jul 13, 2010 at 9:25 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu at jp.fujitsu.com> wrote:
> On Mon, 12 Jul 2010 19:35:17 +0900
> Minchan Kim <minchan.kim at gmail.com> wrote:
>
>> Hi, Mel and Kame.
>>
>> On Mon, Jul 12, 2010 at 7:13 PM, Kukjin Kim <kgene.kim at samsung.com> wrote:
>> > Minchan Kim wrote:
>> >>
>> >> Hello.
>> >>
>> > Hello :-)
>> >
>> >> On Mon, Jul 12, 2010 at 5:32 PM, Kukjin Kim <kgene.kim at samsung.com> wrote:
>> >> > Russell,
>> >> >
>> >> > Hi,
>> >> >
>> >> > Kukjin Kim wrote:
>> >> >> Russell wrote:
>> >> >> > So, memory starts at 0x20000000 and finishes at 0x25000000.  That's
>> >> > fine.
>> >> >> > That doesn't mean the section size is 16MB.
>> >> >> >
>> >> >> > As I've already said, the section size has _nothing_ what so ever to
>> > do
>> >> >> > with the size of memory, or the granularity of the size of memory.
>> >  By
>> >> >> > way of illustration, it is perfectly legal to have a section size of
>> >> >> > 256MB but only have 1MB in a section and this is perfectly legal.  So
>> >> >> > sections do not have to be completely filled.
>> >> >> >
>> >> >> Actually, as you know, the hole's area of mem_map is freed from bootmem
>> > if
>> >> > a
>> >> >> section has a hole when initializing sparse memory.
>> >> >>
>> >> >> I identified that a section doesn't need to be a contiguous area of
>> >> > physical
>> >> >> memory when reading your comment with the fact that the mem_map of a
>> >> > section
>> >> >> can be smaller than the size of a section.
>> >> >>
>> >> >> I found, however, the kernel panics when modifying min_free_kbytes file
>> > in
>> >> >> the proc filesystem if a section has a hole.
>> >> >>
>> >> >> While processing the change of min_free_kbytes in the kernel, page
>> >> >> descriptors in a hole of an online section is accessed.
>> >> >
>> >> > As I said, following error happens.
>> >> > It would be helpful to me if any opinions or comments.
>> >> >
>> >>
>> >> Could you test below patch?
>> >> Also, you should select ARCH_HAS_HOLES_MEMORYMODEL in your config.
>> >>
>> > Yes, I did it, and no kernel panic happens :-)
>> >
>> > Same test...
>> > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
>> > 2736
>> > [root at Samsung ~]# echo "2730" > /proc/sys/vm/min_free_kbytes
>> > [root at Samsung ~]#
>> > [root at Samsung ~]# cat /proc/sys/vm/min_free_kbytes
>> > 2730
>> >
>> >
>> >> @@ -2824,8 +2825,13 @@ static void setup_zone_migrate_reserve(struct zone
>> >> *zone)
>> >>         for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
>> >>                 if (!pfn_valid(pfn))
>> >>                         continue;
>> >> +
>> >>                 page = pfn_to_page(pfn);
>> >>
>> >> +                /* Watch for unexpected holes punched in the memmap */
>> >> +                if (!memmap_valid_within(pfn, page, zone))
>> >> +                        continue;
>> >> +
>> >>                 /* Watch out for overlapping nodes */
>> >>                 if (page_to_nid(page) != zone_to_nid(zone))
>> >>                         continue;
>> >>
>> >>
>> >>
>> >
>> > ...Could you please explain about this issue?
>>
>> The setup_zone_migrate_reserve doesn't check memmap hole. I think
>> compaction would have the  same problem, too. I don't know there is a
>> problem in elsewhere.
>> Anyway, I think memmap_valid_within calling whenever walking whole pfn
>> range isn't a good solution. We already have pfn_valid. Could we check
>> this in there?
>> For example, mem_section have a valid pfn range and then valid section
>> can test it in pfn_valid.
>>
>> What do you think about it?
>>
>
> please map another "fake" page when you free memmap.
>
> For example, prepare a page filled with (1 << PG_reserved).
> and replace it with unnecessary memmap rather than freeing a page for memmap.

Hmm. I don't got your point.
The problem is that we access struct page by pfn number not address.

You mean let's remain memmap on hole with changing PageReseved instead of free?

>
> Then, you can find a PageReserved() page there. However it doesn't have valid
> nid,zid,secitonID, I don't think it can cause trouble.
> But you may see some BUG_ON() hit. If so, I think ARM guys can add
>
> #define PageHole(page) (PageReserved(page) && PageXXXXX(page))
> for avoiding BUG_ON() hit. (and this will help to skip unnecessary
> memmap_init_zone())

I think it's not a good idea to add new flag. If
Kame. Could you review my RFC patch which makes pfn_valid check more
tightly on sparsemem?

>
> By this, you can finaly remove CONFIG_MEMMAP_HOLES_IN_ZONE, and makes ARM's
> sparsemem much easier for maintainance.

I think above problem isn't only ARM one. It can happen on any
sparsemem which do loose pfn_valid check.

>
> section ID is not important as far as you call page_to_pfn().
> You may able to add WARNING when page_to_pfn() is called against a HOLE page.

Yes. It's a good idea. We can add it optionally. But for it, we need
PageHole or mem_section bank range check. I like second like my
previous posting.


-- 
Kind regards,
Minchan Kim



More information about the linux-arm-kernel mailing list