[PATCH 1/1] lib: sbi: relax scoldboot_lock spinlock
Heinrich Schuchardt
xypron.glpk at gmx.de
Thu Aug 13 16:04:34 EDT 2020
On 13.08.20 21:16, Jessica Clarke wrote:
> On 13 Aug 2020, at 19:58, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>> On 13.08.20 19:45, Atish Patra wrote:
>>> On Thu, Aug 13, 2020 at 10:03 AM Jessica Clarke <jrtc27 at jrtc27.com> wrote:
>>>>
>>>> On 13 Aug 2020, at 17:51, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>>>> On 13.08.20 18:36, Jessica Clarke wrote:
>>>>>> On 13 Aug 2020, at 17:24, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>>>>>>
>>>>>>> OpenSBI hangs in wake_coldboot_harts() indefinitely on the MaixDuino board
>>>>>>> when trying to boot U-Boot as FW_PAYLOAD.
>>>>>>
>>>>>> This smells of there being a more fundamental issue that's not being
>>>>>> addressed which this manages to work around, either reliably or by
>>>>>> chance. What is the actual deadlock condition you observe?
>>>>>
>>>>> There are only two function using coldboot_lock spinlock:
>>>>>
>>>>> * wake_coldboot_harts()
>>>>> * wait_for_coldboot()
>>>>>
>>>>> wake_coldboot_harts() was entered but never left without patching one of
>>>>> the functions.
>>>>>
>>>>> The Kendryte K210 has two harts. One goes the cold boot path while the
>>>>> other is sent to the warm boot path.
>>>>
>>>> The K210's platform definition uses clint_ipi_send, which as far as I
>>>> can tell is entirely non-blocking. The only thing wake_coldboot_harts
>>>> does with the lock held is set coldboot_done to 1 and send an IPI to
>>>> all the other harts. Thus wake_coldboot_harts should never be able to
>>>> block holding the spin lock, meaning wait_for_coldboot should
>>>> eventually wake up and be able to acquire the lock (on platforms that
>>>> implement WFI fully it shouldn't have to wait long for the lock, though
>>>> on platforms that make it a NOP there might be a bit of .
>>>>
>>>
>>> Yes. If wake_coldboot_harts is never able to acquire the spinlock,
>>> that indicates spinlock
>>> is always acquired by wait_for_coldboot. That can happen if
>>>
>>> 1. WFI is implemented as NOP as suggested by Jessica and MSIP in MIP
>>> is set which
>>> shouldn't happen unless there is an IPI sent.
>>>
>>> Can you try clearing the MIP just before acquiring the lock ? In qemu
>>> & unleashed, modifying the MSIP bit in
>>> MIP doesn't actually do anything. So we do sbi_platform_ipi_clear to reset it.
>>
>> The following does *not* solve the problem:
>
> What if you do it before enabling MSIP in MIE? If that doesn't work it
> sounds like something is very broken with that hardware.
This *fails* too:
diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
index a7fb848..a29de51 100644
--- a/lib/sbi/sbi_init.c
+++ b/lib/sbi/sbi_init.c
@@ -96,6 +96,8 @@ static void wait_for_coldboot(struct sbi_scratch
*scratch, u32 hartid)
/* Save MIE CSR */
saved_mie = csr_read(CSR_MIE);
+ sbi_platform_ipi_clear(plat, hartid);
+
/* Set MSIE bit to receive IPI */
csr_set(CSR_MIE, MIP_MSIP);
Sean mentioned that the Kendryte K210 implements RISC-V Privileged
Architectures Specification 1.9.1. But I don't see anything in the
change log of the spec that would be relevant here.
To my knowledge the Kendryte K210 systems are the only really affordable
64bit RISC-V boards up to now. So it would be great to get them properly
running in OpenSBI even if they deviate from the most current RISC-V specs.
Best regards
Heinrich
>
>> diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
>> index a7fb848..a3111d3 100644
>> --- a/lib/sbi/sbi_init.c
>> +++ b/lib/sbi/sbi_init.c
>> @@ -105,6 +105,8 @@ static void wait_for_coldboot(struct sbi_scratch
>> *scratch, u32 hartid)
>> /* Mark current HART as waiting */
>> sbi_hartmask_set_hart(hartid, &coldboot_wait_hmask);
>>
>> + sbi_platform_ipi_clear(plat, hartid);
>> +
>> /* Wait for coldboot to finish using WFI */
>> while (!coldboot_done) {
>> spin_unlock(&coldboot_lock);
>>
>> Anyway according to the RISC-V spec wfi may be implemented as a simple nop:
>>
>> "The purpose of the WFI instruction is to provide a hint to the
>> implementation, and so a legal implementation is to simply implement WFI
>> as a NOP."
>
> Yes, we know, that's not the problem here. The problem is that your
> hardware _actually thinks there's a pending software interrupt_, since
> the MSIP bit is set in MIP. An implementation that makes WFI a NOP is
> perfectly fine because, as I've said at least once already, the _inner_
> loop still checks MIP.MSIP before attempting to acquire the lock again.
>
> Jess
>
>>>> Please further investigate (or explain if you already know) exactly how
>>>> you end up in a deadlock situation between wake_coldboot_harts and
>>>> wait_for_coldboot, as I cannot currently see how it could ever be
>>>> possible.
>>>>
>>>>>>> Even if reading and writing coldboot_done is not atomic it is safe to check
>>>>>>> it without using a spinlock.
>>>>>>
>>>>>> Maybe microarchitecturally on the specific hardware in question (though
>>>>>> I wouldn't even be so certain about that), but definitely not in C. You
>>>>>> now have a data race in reading coldboot_done, which isn't even
>>>>>> volatile^, so a sufficiently smart compiler would be completely within
>>>>>> its right to do whatever it wants to the code, including things like
>>>>>> delete the loop entirely (or, slightly less dramatic, turn it into an
>>>>>> if). This patch introduces undefined behaviour.
>>>>>>
>>>>>> Jess
>>>>>>
>>>>>> ^ Please do not make it volatile as a workaround, that is not the right
>>>>>> way to "fix" such issues.
>>>>>
>>>>> What do you suggest Jessica?
>>>>
>>>> My primary suggestion is above, since this does not look like a
>>>> necessary fix.
>>>>
>>>> There is the separate question of whether this code should be using
>>>> spin locks anyway. One could optimise it with suitable use of proper
>>>> atomics to access coldboot_done, which would mask whatever issue you
>>>> are seeing, but that should not be necessary, and in early boot code
>>>> like this that's not on a hot code path it's better to keep things
>>>> simple like it is currently rather than risk writing unsafe code
>>>> through incorrect use of atomics. I would feel much more comfortable
>>>> leaving the code using spin locks.
>>>>
>>>> Jess
>>>>
>>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>>>>>> ---
>>>>>>> lib/sbi/sbi_init.c | 4 ++--
>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
>>>>>>> index a7fb848..a0e4f11 100644
>>>>>>> --- a/lib/sbi/sbi_init.c
>>>>>>> +++ b/lib/sbi/sbi_init.c
>>>>>>> @@ -106,14 +106,14 @@ static void wait_for_coldboot(struct sbi_scratch *scratch, u32 hartid)
>>>>>>> sbi_hartmask_set_hart(hartid, &coldboot_wait_hmask);
>>>>>>>
>>>>>>> /* Wait for coldboot to finish using WFI */
>>>>>>> + spin_unlock(&coldboot_lock);
>>>>>>> while (!coldboot_done) {
>>>>>>> - spin_unlock(&coldboot_lock);
>>>>>>> do {
>>>>>>> wfi();
>>>>>>> cmip = csr_read(CSR_MIP);
>>>>>>> } while (!(cmip & MIP_MSIP));
>>>>>>> - spin_lock(&coldboot_lock);
>>>>>>> };
>>>>>>> + spin_lock(&coldboot_lock);
>>>>>>>
>>>>>>> /* Unmark current HART as waiting */
>>>>>>> sbi_hartmask_clear_hart(hartid, &coldboot_wait_hmask);
>>>>>>> --
>>>>>>> 2.28.0
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> opensbi mailing list
>>>>>>> opensbi at lists.infradead.org
>>>>>>> http://lists.infradead.org/mailman/listinfo/opensbi
>>>>
>>>>
>>>> --
>>>> opensbi mailing list
>>>> opensbi at lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/opensbi
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Atish
>
>
More information about the opensbi
mailing list