[PATCH 1/1] lib: sbi: relax scoldboot_lock spinlock

Jessica Clarke jrtc27 at jrtc27.com
Thu Aug 13 15:16:13 EDT 2020


On 13 Aug 2020, at 19:58, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
> On 13.08.20 19:45, Atish Patra wrote:
>> On Thu, Aug 13, 2020 at 10:03 AM Jessica Clarke <jrtc27 at jrtc27.com> wrote:
>>> 
>>> On 13 Aug 2020, at 17:51, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>>> On 13.08.20 18:36, Jessica Clarke wrote:
>>>>> On 13 Aug 2020, at 17:24, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>>>>> 
>>>>>> OpenSBI hangs in wake_coldboot_harts() indefinitely on the MaixDuino board
>>>>>> when trying to boot U-Boot as FW_PAYLOAD.
>>>>> 
>>>>> This smells of there being a more fundamental issue that's not being
>>>>> addressed which this manages to work around, either reliably or by
>>>>> chance. What is the actual deadlock condition you observe?
>>>> 
>>>> There are only two function using coldboot_lock spinlock:
>>>> 
>>>> * wake_coldboot_harts()
>>>> * wait_for_coldboot()
>>>> 
>>>> wake_coldboot_harts() was entered but never left without patching one of
>>>> the functions.
>>>> 
>>>> The Kendryte K210 has two harts. One goes the cold boot path while the
>>>> other is sent to the warm boot path.
>>> 
>>> The K210's platform definition uses clint_ipi_send, which as far as I
>>> can tell is entirely non-blocking. The only thing wake_coldboot_harts
>>> does with the lock held is set coldboot_done to 1 and send an IPI to
>>> all the other harts. Thus wake_coldboot_harts should never be able to
>>> block holding the spin lock, meaning wait_for_coldboot should
>>> eventually wake up and be able to acquire the lock (on platforms that
>>> implement WFI fully it shouldn't have to wait long for the lock, though
>>> on platforms that make it a NOP there might be a bit of .
>>> 
>> 
>> Yes. If wake_coldboot_harts is never able to acquire the spinlock,
>> that indicates spinlock
>> is always acquired by wait_for_coldboot. That can happen if
>> 
>> 1. WFI is implemented as NOP as suggested by Jessica and MSIP in MIP
>> is set which
>> shouldn't happen unless there is an IPI sent.
>> 
>> Can you try clearing the MIP just before acquiring the lock ? In qemu
>> & unleashed, modifying the MSIP bit in
>> MIP doesn't actually do anything. So we do sbi_platform_ipi_clear  to reset it.
> 
> The following does *not* solve the problem:

What if you do it before enabling MSIP in MIE? If that doesn't work it
sounds like something is very broken with that hardware.

> diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
> index a7fb848..a3111d3 100644
> --- a/lib/sbi/sbi_init.c
> +++ b/lib/sbi/sbi_init.c
> @@ -105,6 +105,8 @@ static void wait_for_coldboot(struct sbi_scratch
> *scratch, u32 hartid)
>        /* Mark current HART as waiting */
>        sbi_hartmask_set_hart(hartid, &coldboot_wait_hmask);
> 
> +       sbi_platform_ipi_clear(plat, hartid);
> +
>        /* Wait for coldboot to finish using WFI */
>        while (!coldboot_done) {
>                spin_unlock(&coldboot_lock);
> 
> Anyway according to the RISC-V spec wfi may be implemented as a simple nop:
> 
> "The purpose of the WFI instruction is to provide a hint to the
> implementation, and so a legal implementation is to simply implement WFI
> as a NOP."

Yes, we know, that's not the problem here. The problem is that your
hardware _actually thinks there's a pending software interrupt_, since
the MSIP bit is set in MIP. An implementation that makes WFI a NOP is
perfectly fine because, as I've said at least once already, the _inner_
loop still checks MIP.MSIP before attempting to acquire the lock again.

Jess

>>> Please further investigate (or explain if you already know) exactly how
>>> you end up in a deadlock situation between wake_coldboot_harts and
>>> wait_for_coldboot, as I cannot currently see how it could ever be
>>> possible.
>>> 
>>>>>> Even if reading and writing coldboot_done is not atomic it is safe to check
>>>>>> it without using a spinlock.
>>>>> 
>>>>> Maybe microarchitecturally on the specific hardware in question (though
>>>>> I wouldn't even be so certain about that), but definitely not in C. You
>>>>> now have a data race in reading coldboot_done, which isn't even
>>>>> volatile^, so a sufficiently smart compiler would be completely within
>>>>> its right to do whatever it wants to the code, including things like
>>>>> delete the loop entirely (or, slightly less dramatic, turn it into an
>>>>> if). This patch introduces undefined behaviour.
>>>>> 
>>>>> Jess
>>>>> 
>>>>> ^ Please do not make it volatile as a workaround, that is not the right
>>>>> way to "fix" such issues.
>>>> 
>>>> What do you suggest Jessica?
>>> 
>>> My primary suggestion is above, since this does not look like a
>>> necessary fix.
>>> 
>>> There is the separate question of whether this code should be using
>>> spin locks anyway. One could optimise it with suitable use of proper
>>> atomics to access coldboot_done, which would mask whatever issue you
>>> are seeing, but that should not be necessary, and in early boot code
>>> like this that's not on a hot code path it's better to keep things
>>> simple like it is currently rather than risk writing unsafe code
>>> through incorrect use of atomics. I would feel much more comfortable
>>> leaving the code using spin locks.
>>> 
>>> Jess
>>> 
>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>>>>> ---
>>>>>> lib/sbi/sbi_init.c | 4 ++--
>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>> 
>>>>>> diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
>>>>>> index a7fb848..a0e4f11 100644
>>>>>> --- a/lib/sbi/sbi_init.c
>>>>>> +++ b/lib/sbi/sbi_init.c
>>>>>> @@ -106,14 +106,14 @@ static void wait_for_coldboot(struct sbi_scratch *scratch, u32 hartid)
>>>>>>    sbi_hartmask_set_hart(hartid, &coldboot_wait_hmask);
>>>>>> 
>>>>>>    /* Wait for coldboot to finish using WFI */
>>>>>> +   spin_unlock(&coldboot_lock);
>>>>>>    while (!coldboot_done) {
>>>>>> -           spin_unlock(&coldboot_lock);
>>>>>>            do {
>>>>>>                    wfi();
>>>>>>                    cmip = csr_read(CSR_MIP);
>>>>>>             } while (!(cmip & MIP_MSIP));
>>>>>> -           spin_lock(&coldboot_lock);
>>>>>>    };
>>>>>> +   spin_lock(&coldboot_lock);
>>>>>> 
>>>>>>    /* Unmark current HART as waiting */
>>>>>>    sbi_hartmask_clear_hart(hartid, &coldboot_wait_hmask);
>>>>>> --
>>>>>> 2.28.0
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> opensbi mailing list
>>>>>> opensbi at lists.infradead.org
>>>>>> http://lists.infradead.org/mailman/listinfo/opensbi
>>> 
>>> 
>>> --
>>> opensbi mailing list
>>> opensbi at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/opensbi
>> 
>> 
>> 
>> --
>> Regards,
>> Atish




More information about the opensbi mailing list