[Question] race condition in mm/page_alloc.c regarding page->lru?

Mon Apr 5 06:49:44 EDT 2010

Hi, Mel and Arve.

On Mon, Apr 5, 2010 at 7:14 PM, Mel Gorman <mel at csn.ul.ie> wrote:
> On Fri, Apr 02, 2010 at 05:59:00PM -0700, Arve Hj?nnev?g wrote:
>> On Fri, Apr 2, 2010 at 2:48 AM, Mel Gorman <mel at csn.ul.ie> wrote:
>> > On Fri, Apr 02, 2010 at 02:03:23PM +0900, KOSAKI Motohiro wrote:
>> >> Cc to Mel,
>> >>
>> >> > 2 patches related to page_alloc.c were applied.
>> >> > Does anyone see a connection between the 2 patches and the panic?
>> >> > NOTE: the full patches are attached.
>> >>
>> >> I think your attached two patches are perfectly unrelated your problem.
>> >>
>> >
>> > Agreed. It's unlikely that there is a race as such in the page
>> > allocator. In buffered_rmqueue that you initially talk about, the lists
>> > being manipulated are per-cpu lists. About the only way to corrupt them
>> > is if you had a NMI hander that called the page allocator. I really hope
>> > your platform is not doing anything like that.
>> >
>> > A double free of page->lru is a possibility. You could try reproducing
>> > the problem with CONFIG_DEBUG_LIST enabled to see if anything falls out.
>> >
>> >> "mm: Add min_free_order_shift tunable." seems makes zero sense. I don't think this patch
>> >> need to be merge.
>> >>
>> >
>> > It makes a marginal amount of sense. Basically what it does is allowing
>> > high-order allocations to go much further below their watermarks than is
>> > currently allowed. If the platform in question is doing a lot of high-order
>> > allocations, this patch could be seen to "fix" the problem but you wouldn't
>> > touch mainline with it with a barge pole. It would be more stable to fix
>> > the drivers to not use high order allocations or use a mempool.
>> >
>>
>> The high order allocation that caused problems was the first level
>> page table for each process.
>
> Out of curiousity, how big is that allocation? Is it specific to
> android? If it is, I guess it can be let slide but if it's common, it

It is the specific on ARM. You can refer get_pgd_slow in arch/arm/mm/pgd.c.
It allocates order 2 page for pgd.

> would be worth thinking of an arch-hook that tells the VM that a
> particular high-order is very common. For example, one possibility would
> be to ask kswapd to always reclaim at a given order even if the
> watermarks required are for a lower order.

Just out of curiosity, too.

Normally, embedded system don't have fork-bomb workload.
But I think android's case is some different.
That's because Dalvik(JVM) keeps many memory which are anon pages for byte codes
by itself as possible as.
So system always doesn't have enough memory.
In addition, most of embedded system don't have swap. It makes system
worse, too.
So current reclaimer can't be work well.

I am not sure my assumption.
Arve, my guessing is right?
If it is so, Dalvik have to solve this problem?
For example, AFAIK, android kernel has low memory killer.
If kernel signals memory pressure, Dalvik have to discard some
anon pages which has byte codes for executable.

It is just my guessing about android. If I misunderstood about android,
please, correct me. :)

>
>> Each time a new process started the
>> kernel would empty the entire page cache to create contiguous free
>> memory.
>
> I ask because I'm surprised the entire page cache got chucked out

Maybe it was because system has lots of anon pages but no swap.

-- 
Kind regards,
Minchan Kim