[PATCH] percpu: fix chunk range calculation

Dave Young dyoung at redhat.com
Mon Nov 21 22:00:28 EST 2011

On 11/22/2011 01:01 AM, Tejun Heo wrote:

> Hello, Dave.
> On Mon, Nov 21, 2011 at 09:45:51AM +0800, Dave Young wrote:
>> On 11/19/2011 02:55 AM, Tejun Heo wrote:
>>> Percpu allocator recorded the cpus which map to the first and last
>>> units in pcpu_first/last_unit_cpu respectively and used them to
>>> determine the address range of a chunk - e.g. it assumed that the
>>> first unit has the lowest address in a chunk while the last unit has
>>> the highest address.
>>> This simply isn't true.  Groups in a chunk can have arbitrary positive
>>> or negative offsets from the previous one and there is no guarantee
>>> that the first unit occupies the lowest offset while the last one the
>>> highest.
>>> Fix it by actually comparing unit offsets to determine cpus occupying
>>> the lowest and highest offsets.  Also, rename pcu_first/last_unit_cpu
>>> to pcpu_low/high_unit_cpu to avoid confusion.
>>> The chunk address range is used to flush cache on vmalloc area
>>> map/unmap and decide whether a given address is in the first chunk by
>>> per_cpu_ptr_to_phys() and the bug was discovered by invalid
>>> per_cpu_ptr_to_phys() translation for crash_note.
>>> Kudos to Dave Young for tracking down the problem.
>> Tejun, thanks
>> Now that if addr is not in first trunk it must be in vmalloc area, below
>> logic should be right:
>> 	if (in_first_chunk) {
>> 		if (!is_vmalloc_addr(addr))
>> 			return __pa(addr);
>> 		else
>> 			return page_to_phys(vmalloc_to_page(addr));
>> 	} else
>> 		if (!is_vmalloc_addr(addr)) /* not possible */
>> 			return __pa(addr);
>> 		else
>> 			return page_to_phys(pcpu_addr_to_page(addr));
>> So how about just simply remove in first chunk checking to simplify the
>> code as followging:
>> phys_addr_t per_cpu_ptr_to_phys(void *addr)
>> {
>> 	if (!is_vmalloc_addr(addr))
>> 		return __pa(addr);
>> 	else
>> 		return page_to_phys(pcpu_addr_to_page(addr));
>> }
> Yes, that's much simpler.  Hmmm... I'm still slightly reluctant
> because the current code reflects better how percpu allocator actually
> works.  It has special setup for the first chunk, which currently
> supports either embedding in linear address space or vmalloc mapping,
> and, from the second one, the backing allocator (currently either vm
> or km) provides translation.
> per_cpu_ptr_to_phys() has been pretty good at exposing wrong
> assumptions in address translation mostly because this is one of few
> points where the assumptions are visible in a verifiable way (I think
> this is the second bug it has discovered).
> This forces the verification that "if it isn't in one of the explicit
> first chunk areas, it MUST follow backing allocator mapping" which can
> discover both bugs in percpu allocator itself and
> per_cpu_ptr_to_phys() callers.  e.g. with the proposed change, if the
> caller passes in usual kernel address on percpu-vm which is not in the
> first chunk, it will return __pa for the address even though that
> address can't possibly be a percpu address, but the current code will
> trip VIRTUAL_BUG_ON() in vmalloc_to_page().
> Maybe we can add comment there how it can be further simplified and
> why things look more complicated than necessary?

I wonder if it's good to add something like CONFIG_PER_CPU_DEBUG?
Anyway I have no strong opinion with this. I'm fine to either of adding
comment and put it into debug around.

Thank you for your fix.


More information about the kexec mailing list