[PATCH] Add a counter and debug printf to change the latency inside tlb_sync

Fri Jan 10 11:12:03 PST 2025

On 10 Jan 2025, at 19:08, Jessica Clarke <jrtc27 at jrtc27.com> wrote:
> 
> On 10 Jan 2025, at 02:16, Chao-ying Fu <icebergfu at gmail.com> wrote:
>> 
>> On Thu, Jan 9, 2025 at 5:18 PM Bo Gan <ganboing at gmail.com> wrote:
>>> 
>>> Hi Chaoying,
>>> 
>>> I don't think this patch does anything at all. Using printf to delay is naive.
>>> Can you give concrete example of how the livelock happened and when it happens
>>> some precise number on the performance impact?
>> 
>> Yes, the patch is naive, and we can treat it as the debug patch.
>> However, this patch does help us to avoid livelock.
>> From our 24-hart configuration, one hart cannot update the variable
>> tlb_sync, because another hart always wins to read the value. Then,
>> the tlb_sync loop never exits.
> 
> That sounds like your core violates the forward progress guarantee
> required by the RISC-V spec.

Except atomic_add_return uses amoadd.d.aqrl, which doesn’t even need
such a guarantee, because it’s a single instruction, so that’s saying
your hart can wedge on AMOs. Unless you’ve also patched OpenSBI to use
LR/SC there, in which case we’re back to the forward progress guarantee
being violated.

I have serious concerns about this implementation.

Jess