[PATCH 2/2] riscv: Eagerly flush in flush_icache_deferred()

Wed Aug 14 17:34:19 PDT 2024

> <---- Thread 1 starts running here on CPU1
> 
> <---- Thread 2 starts running here with same mm
> 
> T2: prctl(PR_RISCV_SET_ICACHE_FLUSH_CTX, PR_RISCV_CTX_SW_FENCEI_ON, PR_RISCV_SCOPE_PER_PROCESS);
> T2: <-- kernel sets current->mm->context.force_icache_flush to true

Mmh, TBH, I'm not sure how this patch is supposed to fix the race in
question:

For once, AFAIU, the operation

T2: cpumask_setall(&current->mm->context.icache_stale_mask)

(on CPU2?) you've added with this patch...

> T2: <modification of instructions>
> T2: fence.i
> 
> T1: fence.i (to synchronize with other thread, has some logic to
>              determine when to do this)
> T1: <-- thread 1 is preempted
> T1: <-- thread 1 is placed onto CPU3 and starts context switch sequence
> T1 (kernel): flush_icache_deferred() -> skips flush because switch_to_should_flush_icache() returns true

... does _not_ ensure that T1: flush_icache_deferred() on CPU3 will
observe/read from that operation: IOW,

T1: cpumask_test_and_clear_cpu(cpu, &mm->context.icache_stale_mask)

may still evaluate to FALSE, thus preventing the FENCE.I execution.

Moreover, AFAIU, ...

> 				     -> thread has migrated and task->mm->context.force_icache_flush is true
> 
> T2: prctl(PR_RISCV_SET_ICACHE_FLUSH_CTX, PR_RISCV_CTX_SW_FENCEI_OFF, PR_RISCV_SCOPE_PER_PROCESS);

... moving the operation(s)

T2: set_icache_stale_mask()
	T2: cpumask_setall(&current->mm->context.icache_stale_mask)

before the following operation...  (per patch #1)

> T2 (kernel): kernel sets current->mm->context.force_icache_flush = false
> 
> T1 (kernel): switch_to() calls switch_to_should_flush_icache() which now
> 	     returns false because task->mm->context.force_icache_flush
> 	     is false due to the other thread emitting
> 	     PR_RISCV_CTX_SW_FENCEI_OFF.

... does not ensure that T1: switch_to_should_flush_icache() on CPU3
will observe

T1: cpumask_test_cpu(<cpu3>, &task->mm->context.icache_stale_mask) == true

in fact, T1 may evaluate the latter expression to FALSE while still
being able to observe the "later" operation, i.e.

T1: task->mm->context.force_icache_flush == false

Perhaps a simplified but useful way to look at such scenarios is as
follows:

  - CPUs are just like nodes of a distributed system, and

  - store are like messages to be exchanged (by the memory subsystem)
    between CPUs: without some (explicit) synchronization/constraints,
    messages originating from a given CPU can propagate/be visible to
    other CPUs at any time and in any order.

IAC, can you elaborate on the solution proposed here (maybe by adding
some inline comments), keeping the above considerations in mind? what
am I missing?

> T1 (back in userspace): Instruction cache was never flushed on context
> 			switch to CPU3, and thus may execute incorrect
> 			instructions.

Mmh, flushing the I$ (or, as meant here, executing a FENCE.I) seems
to be only half of the solution: IIUC, we'd like to ensure that the
store operation

T2: <modification of instructions>

originated from CPU2 is _visible_ to CPU3 by the time that FENCE.I
instruction is executed, cf.

[from Zifencei - emphasis mine]

A FENCE.I instruction ensures that a subsequent instruction fetch on
a RISC-V hart will see any previous data stores _already visible_ to
the same RISC-V hart.

IOW (but assuming code is in coherent main memory), imagine that the
(putative) FENCE.I on CPU3 is replaced by some

T1: LOAD reg,0(insts_addr)

question is: would such a load be guaranteed to observe the store

T2: <modification of instructions>  #  STORE new_insts,0(insts_addr)

originated from CPU2? can you elaborate?

  Andrea