Bug in v7_coherent_kern_range() ?

Sun Apr 1 04:57:34 EDT 2012

On 01.04.2012 09:09, Huang Shijie wrote:
> Hi Dirk:
>> Hi Huang Shijie,
>>
>> On 01.04.2012 05:21, Huang Shijie wrote:
>>> [1] Platform:
>>> freescale's IMX6Q(4 cores) , ARM CORTEX-A9
>>>
>>> [2] kernel:
>>> 3.0.15(I have cherry-picked many patches, and the
>>> arch/arm/mm/cache-v7.S
>>> is same code with the latest kernel v3.4-rc1)
>>> enable SMP, VIPT,
>>
>> Could you try an unpatched, clean v3.4-rc1 instead?
> Sorry, I could not try the v3.4-rc1. Some our bsp drivers are not DT
> supported.

Have you tried the 3.2 based Linaro kernel? It's DT based.

Best regards

Dirk

>> What's about your 2.6.38?
> 2.6.38 is not a good version to run the imx6q. It losts many our
> drivers's patches.
>>
>> What's about 3.0.26? 3.0.15 seems to miss some maybe relevant patches.
>>
> Our bsp release are based on 3.0.15. so we could not test it on 3.0.26
> too.
>
>>> [3] application:
>>
>> Could you share a (simple) test case?
> The test case is like this:
> #gplay xx.avi
>
> gplay is our own player, such as mplayer.
> I just created a script which will play the video files one by one.
>
> BR
> Huang Shijie
>
>>
>> Best regards
>>
>> Dirk
>>
>>> I use our our application which will clone many threads,
>>> two threads (assume as A and B) may do the same thing at the same time
>>> as the following code:
>>>
>>> In most of the time, it's ok.
>>> But in some unknown situation, cacheflush() failed and one threads
>>> (assume A) may hung up in the following code:
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8
>>> read(8,
>>> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"...,
>>>
>>> 512) = 512
>>> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0
>>> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
>>> 8, 0)
>>> = 0x2ff0a000
>>> mprotect(0x2ff18000, 28672, PROT_NONE) = 0
>>> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE,
>>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000
>>> close(8) = 0
>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0
>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0
>>> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System
>>> hung up here!!!
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>> [4] kernel log
>>> I use "echo t> /proc/sysrq-trigger" to show the tasks's information:
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>> multiqueue0:src D 804cd678 0 7328 5963 0x00000001
>>> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>]
>>> (__down_read+0xa8/0xe0)
>>> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>]
>>> (do_page_fault+0xbc/0x480)
>>> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>]
>>> (do_DataAbort+0x34/0x98)
>>> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>]
>>> (__dabt_svc+0x70/0xa0)
>>> Exception stack(0xbae37ea8 to 0xbae37ef0)
>>> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000
>>> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000
>>> bae37ef0
>>> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff
>>> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>]
>>> (v7_coherent_kern_range+0x20/0x80)
>>> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>]
>>> (arm_syscall+0x2a0/0x2c4)
>>> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>]
>>> (ret_fast_syscall+0x0/0x3c)
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>> The do_cache_op() has already held the mm->mmap_sem, but
>>> v7_coherent_kern_range()
>>> cause one page fault during it flush the cache. deadlock! So it
>>> hung up
>>> in the do_page_fault().
>>>
>>> [5] questions:
>>> Why the v7_coherent_kern_range() can caused the data abort?
>>> Is there something wrong about the v7_coherent_kern_range()?
>>>
>>>
>>> thanks
>>> Huang Shijie
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>
>>
>
>
>