Kernel related (?) user space crash at ARM11 MPCore

Catalin Marinas catalin.marinas at
Sun Sep 20 18:46:03 EDT 2009

Russell King - ARM Linux wrote:
> On Sun, Sep 20, 2009 at 10:31:39AM +0100, Russell King - ARM Linux wrote:
>> On Sun, Sep 20, 2009 at 09:39:00AM +0100, Catalin Marinas wrote:
>>> I don't think it's recommended to clean the D-cache (and invalidate the
>>> I-cache) every time in copy_user_highpage, therefore cache maintenance
>>> via mprotect -> change_protection -> flush_cache_range may be a better
>>> option.
>> I really don't believe so - try it yourself - run some benchmarks on your
>> ARMv6 or v7 system, comparing the results both with and without the patch.
>> Especially pay attention to the process creation/shell script performance.
>> I think you'll find that with your patch, it'll be worse than ARM systems
>> running at similar clock rates with VIVT caches.
> The figures reveal a 10% reduction in the performance of execve - that's
> quite a nasty hit, basically meaning shell scripts will run about 10%
> slower (shell scripts typically exec lots of programs.)
> Using my proposal measures more favourably - there is no measurable impact
> on execve itself (maybe a 0.5% reduction, which I consider to be in the
> measurement noise), but a 5.5% reduction in the performance of fork()+exit()
> - this is using __cpuc_coherent_kern_range() in
> v6_copy_user_highpage_nonaliasing() to ensure the new page is fully
> coherent.

Thanks for running these benchmarks. The results on both your and my 
patch are affected by invalidating the whole I-cache in 
v6_coherent_user_range() rather than doing it by line (that's historical 
because of some erratum on ARM1136 - maybe we should fix this).

Another thing that's affecting the performance of my patch as it 
currently is (and withtout changing generic code) - the D-cache flushing 
generates a fault in some situations which takes time to process. I can 
fix this by using the VAtoPA translation registers in the 
coherent_user_range function.

Anyway, I think it depends on the type of applications you are running. 
I personally don't see shell performance too important, so we may 
disagree on the best fix here.

For a web server (Apache) where you have plenty of forks, your patch 
might affect the performance quite a lot as you get many 
copy_user_highpage() calls for CoW (BTW, unrelated to this issue,, including the Git server, is hosted on a set of 
Marvell MV78100 boards -

While we can choose benchmarks to show that either option is bad, we 
should probably try to get an optimal solution.

My view is that something similar to flush_dcache_page + 
update_mmu_cache would be better (though maybe not these functions 
directly but could try to reuse PG_arch_1).

> One thing I have noticed: it takes the Realview SMP board _two_ attempts
> to boot a kernel.  The first attempt tends to cause a spontaneous reboot
> when the CLCD controller is enabled, or possibly a hang.  The second
> attempt seems to always run fine.

I noticed this as well only on RealView EB but not all boards. The other 
SMP boards I have are fine. It could be a hardware bug, I don't see 
anything obvious in Linux.


More information about the linux-arm-kernel mailing list