[PATCH] riscv: Only flush the mm icache when setting an exec pte
guibing
guibing at nucleisys.com
Sun Jun 30 18:50:29 PDT 2024
Hi Alex,
Any feedback is welcome on the problem, thanks!
在 2024/6/26 11:58, guibing 写道:
> Hi Alex,
>
> Sorry, yesterday I clicked the mouse by mistake to sent an empty email.
>
>> Is it a multithreaded application? You mean that if the application
>> always runs on core1/2/3, you get an illegal instruction, but that
>> does not happen when run on core0?
> test_printf is not a multithread application, just output "hello
> world" strings.
>
> #include <stdio.h>
>
> int main()
> {
> printf("hello world!\n");
> return 0;
> }
>
> From testing results, illegal instruction always occur on core1/2/3,
> no core0.
>
>> Did you check if the instruction in badaddr is different from the
>> expected instruction? The image you provided is not available here,
>> but it indicated 0xf486 which corresponds to "c.sdsp ra, 104(sp)", is
>> that correct?
> this badaddr is same with the expected instruction, but i meet the
> different.
>
> /mnt # ./test_printf
> [ 76.393222] test_printf[130]: unhandled signal 4 code 0x1 at
> 0x0000000000019c82 in test_printf[10000+68000]
> [ 76.400427] CPU: 1 PID: 130 Comm: test_printf Not tainted 6.1.15 #6
> [ 76.406797] Hardware name: asrmicro,xlcpu-evb (DT)
> [ 76.411665] epc : 0000000000019c82 ra : 000000000001ca36 sp :
> 0000003fc5969b00
> [ 76.418941] gp : 000000000007e508 tp : 0000003f8faec780 t0 :
> 000000000000003d
> [ 76.426244] t1 : 0000002abe28cecc t2 : 0000002abe369d63 s0 :
> 0000003fc5969d98
> [ 76.433524] s1 : 0000000000082ab8 a0 : 0000003fc5969b00 a1 :
> 0000000000000000
> [ 76.440835] a2 : 00000000000001a0 a3 : 0000000001010101 a4 :
> 0101010101010101
> [ 76.448108] a5 : 0000003fc5969b00 a6 : 0000000000000040 a7 :
> 00000000000000dd
> [ 76.455432] s2 : 0000000000000001 s3 : 0000003fc5969d38 s4 :
> 0000000000082a70
> [ 76.462695] s5 : 0000000000000000 s6 : 0000000000010758 s7 :
> 0000002abe371648
> [ 76.469995] s8 : 0000000000000000 s9 : 0000000000000000 s10:
> 0000002abe371670
> [ 76.477275] s11: 0000000000000001 t3 : 0000003f8fb954cc t4 :
> 0000000000000000
> [ 76.484576] t5 : 00000000000003ff t6 : 0000000000000040
> [ 76.489948] status: 0000000200004020 badaddr: 00000000ffffffff
> cause: 0000000000000002
> Illegal instruction
>
>> No no, we try to introduce icache flushes whenever it is needed for
>> such uarch.
>>
> core0 is responsible for reading data from sd cards to dcache and ddr.
>
> before core1/2/3 continue to execute the application, it only execute
> fence.i instruction.
>
> in our riscv hardware , fence.i just flush dcache and invalidate
> icache for local core.
>
> in this case, how core1/2/3 can get application instruction data from
> the core0 dcache ?
>
> i try to send remote fence.i to core0, iilegal instruction cannot
> reproduced, it can work well.
>
> @@ -66,8 +66,11 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
> * messages are sent we still need to order this
> hart's writes
> * with flush_icache_deferred().
> */
> + sbi_remote_fence_i(cpumask_of(0));
> smp_mb();
> } else if (IS_ENABLED(CONFIG_RISCV_SBI)) {
> sbi_remote_fence_i(&others);
> } else {
>
>
> thank you for your reply! :)
>
>
> 在 2024/6/25 19:45, Alexandre Ghiti 写道:
>> Hi Guibing,
>>
>> You sent your email in html, so it got rejected by the ML, make sure
>> you reply in plain text mode :)
>>
>> On Tue, Jun 25, 2024 at 10:45 AM 桂兵 <guibing at nucleisys.com> wrote:
>>> Hi alex,
>>>
>>> We have encountered a problem related to this patch and would like
>>> to ask for your advice, thank you in advance!
>>>
>>> Problem description:
>>> When we use the v6.9 kernel, there is an illegal instruction problem
>>> when executing a statically linked application on an SD card, and
>>> this problem is not reproduced in v6.6/v6.1 kernel.
>>> SD card driver uses PIO mode, and the SD card interrupt is bound to
>>> core0. If the system schedule the apllication to execute on core1,
>>> core2, or core3, it will report an illegal instruction, and if
>>> scheduled to execute on core0, it will be executed successfully.
>> Is it a multithreaded application? You mean that if the application
>> always runs on core1/2/3, you get an illegal instruction, but that
>> does not happen when run on core0?
>>
>>> We track the source code, flush_icache_pte function patch leads to
>>> this issue on our riscv hardware.
>>> If you merge this patch into the v6.1 kernel, the same problem can
>>> be reproduced in v6.1 kernel.
>>> If using flush_icache_all() not flush_icache_mm in v6.9 kernel ;
>>> this issue can not be reproduced in v6.9 kernel.
>>>
>>> +void flush_icache_pte(struct mm_struct *mm, pte_t pte)
>>> {
>>> struct folio *folio = page_folio(pte_page(pte));
>>>
>>> if (!test_bit(PG_dcache_clean, &folio->flags)) {
>>> - flush_icache_all();
>>> + flush_icache_mm(mm, false);
>>> set_bit(PG_dcache_clean, &folio->flags);
>>> }
>>> }
>> Did you check if the instruction in badaddr is different from the
>> expected instruction? The image you provided is not available here,
>> but it indicated 0xf486 which corresponds to "c.sdsp ra, 104(sp)", is
>> that correct?
>>
>>>
>>> Our riscv cpu IP supports multi-core L1 dcache synchronization, but
>>> does not support multi-core L1 icache synchronization. iCache
>>> synchronization requires software maintenance.
>>> Does the RISCV architecture kernel in future have mandatory
>>> requirements for multi-core iCache hardware consistency?
>> No no, we try to introduce icache flushes whenever it is needed for
>> such uarch.
>>
>>> Thank you for your reply!
>>>
>>>
>>> Link:[PATCH] riscv: Only flush the mm icache when setting an exec
>>> pte - Alexandre Ghiti (kernel.org)
>>>
>>> ________________________________
>>> 发自我的企业微信
>>>
>>>
>> Thanks for the report,
>>
>> Alex
>>
>
More information about the linux-riscv
mailing list