[PATCH 1/1] lib: sbi: relax scoldboot_lock spinlock

Heinrich Schuchardt xypron.glpk at gmx.de
Thu Aug 13 17:28:46 EDT 2020


On 8/13/20 11:14 PM, Atish Patra wrote:
> On Thu, Aug 13, 2020 at 1:04 PM Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>
>> On 13.08.20 21:16, Jessica Clarke wrote:
>>> On 13 Aug 2020, at 19:58, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>>> On 13.08.20 19:45, Atish Patra wrote:
>>>>> On Thu, Aug 13, 2020 at 10:03 AM Jessica Clarke <jrtc27 at jrtc27.com> wrote:
>>>>>>
>>>>>> On 13 Aug 2020, at 17:51, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>>>>>> On 13.08.20 18:36, Jessica Clarke wrote:
>>>>>>>> On 13 Aug 2020, at 17:24, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>>>>>>>>>
>>>>>>>>> OpenSBI hangs in wake_coldboot_harts() indefinitely on the MaixDuino board
>>>>>>>>> when trying to boot U-Boot as FW_PAYLOAD.
>>>>>>>>
>>>>>>>> This smells of there being a more fundamental issue that's not being
>>>>>>>> addressed which this manages to work around, either reliably or by
>>>>>>>> chance. What is the actual deadlock condition you observe?
>>>>>>>
>>>>>>> There are only two function using coldboot_lock spinlock:
>>>>>>>
>>>>>>> * wake_coldboot_harts()
>>>>>>> * wait_for_coldboot()
>>>>>>>
>>>>>>> wake_coldboot_harts() was entered but never left without patching one of
>>>>>>> the functions.
>>>>>>>
>>>>>>> The Kendryte K210 has two harts. One goes the cold boot path while the
>>>>>>> other is sent to the warm boot path.
>>>>>>
>>>>>> The K210's platform definition uses clint_ipi_send, which as far as I
>>>>>> can tell is entirely non-blocking. The only thing wake_coldboot_harts
>>>>>> does with the lock held is set coldboot_done to 1 and send an IPI to
>>>>>> all the other harts. Thus wake_coldboot_harts should never be able to
>>>>>> block holding the spin lock, meaning wait_for_coldboot should
>>>>>> eventually wake up and be able to acquire the lock (on platforms that
>>>>>> implement WFI fully it shouldn't have to wait long for the lock, though
>>>>>> on platforms that make it a NOP there might be a bit of .
>>>>>>
>>>>>
>>>>> Yes. If wake_coldboot_harts is never able to acquire the spinlock,
>>>>> that indicates spinlock
>>>>> is always acquired by wait_for_coldboot. That can happen if
>>>>>
>>>>> 1. WFI is implemented as NOP as suggested by Jessica and MSIP in MIP
>>>>> is set which
>>>>> shouldn't happen unless there is an IPI sent.
>>>>>
>>>>> Can you try clearing the MIP just before acquiring the lock ? In qemu
>>>>> & unleashed, modifying the MSIP bit in
>>>>> MIP doesn't actually do anything. So we do sbi_platform_ipi_clear  to reset it.
>>>>
>>>> The following does *not* solve the problem:
>>>
>>> What if you do it before enabling MSIP in MIE? If that doesn't work it
>>> sounds like something is very broken with that hardware.
>>
>> This *fails* too:
>>
>
> Probably, the hart executing warmboot invokes sbi_platform_ipi_clear
> even before sbi_ipi_init is being called from the hart running coldboot.

There are two calls to sbi_ipi_init(). Is only the one in coldboot relevant?

Do you mean something like:

diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
index a7fb848..b062ce2 100644
--- a/lib/sbi/sbi_init.c
+++ b/lib/sbi/sbi_init.c
@@ -33,6 +33,8 @@
 	"        | |\n"                                     \
 	"        |_|\n\n"

+static char first = 0x20;
+
 static void sbi_boot_prints(struct sbi_scratch *scratch, u32 hartid)
 {
 	int xlen;
@@ -80,6 +82,8 @@ static void sbi_boot_prints(struct sbi_scratch
*scratch, u32 hartid)
 		   sbi_ecall_version_major(), sbi_ecall_version_minor());
 	sbi_printf("\n");

+	sbi_printf("First %c\n", first);
+
 	sbi_hart_delegation_dump(scratch);
 	sbi_hart_pmp_dump(scratch);
 }
@@ -93,6 +97,9 @@ static void wait_for_coldboot(struct sbi_scratch
*scratch, u32 hartid)
 	unsigned long saved_mie, cmip;
 	const struct sbi_platform *plat = sbi_platform_ptr(scratch);

+	if (!first)
+		first = 'W';
+
 	/* Save MIE CSR */
 	saved_mie = csr_read(CSR_MIE);

@@ -187,6 +194,8 @@ static void __noreturn init_coldboot(struct
sbi_scratch *scratch, u32 hartid)
 	if (rc)
 		sbi_hart_hang();

+	if (!first)
+		first = 'C';
 	rc = sbi_ipi_init(scratch, TRUE);
 	if (rc)
 		sbi_hart_hang();
@@ -247,6 +256,8 @@ static void __noreturn init_warmboot(struct
sbi_scratch *scratch, u32 hartid)
 	if (rc)
 		sbi_hart_hang();

+	if (!first)
+		first = 'w';
 	rc = sbi_ipi_init(scratch, FALSE);
 	if (rc)
 		sbi_hart_hang();

Best regards

Heinrich

>
> Can we confirm the MSIP being set theory first by adding a long
> outside the inner while loop
> before acquiring the lock ? I don't see any other reason why the inner
> loop will keep acquiring the spinlock
> unless we are hitting some nasty spinlock bug with Kendryte.
>
> You may need to directly write MIP value to the uart address
> (0x38000000) if sbi_console_init is not invoked yet.
>
>> diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
>> index a7fb848..a29de51 100644
>> --- a/lib/sbi/sbi_init.c
>> +++ b/lib/sbi/sbi_init.c
>> @@ -96,6 +96,8 @@ static void wait_for_coldboot(struct sbi_scratch
>> *scratch, u32 hartid)
>>         /* Save MIE CSR */
>>         saved_mie = csr_read(CSR_MIE);
>>
>> +       sbi_platform_ipi_clear(plat, hartid);
>> +
>>         /* Set MSIE bit to receive IPI */
>>         csr_set(CSR_MIE, MIP_MSIP);
>>
>> Sean mentioned that the Kendryte K210 implements RISC-V Privileged
>> Architectures Specification 1.9.1. But I don't see anything in the
>> change log of the spec that would be relevant here.
>>
>
> Yeah. I checked 1.9.1 as well. I couldn't find anything weird that can
> cause this issue.
>
>> To my knowledge the Kendryte K210 systems are the only really affordable
>> 64bit RISC-V boards up to now. So it would be great to get them properly
>> running in OpenSBI even if they deviate from the most current RISC-V specs.
>>
>> Best regards
>>
>> Heinrich
>>
>>>
>>>> diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
>>>> index a7fb848..a3111d3 100644
>>>> --- a/lib/sbi/sbi_init.c
>>>> +++ b/lib/sbi/sbi_init.c
>>>> @@ -105,6 +105,8 @@ static void wait_for_coldboot(struct sbi_scratch
>>>> *scratch, u32 hartid)
>>>>        /* Mark current HART as waiting */
>>>>        sbi_hartmask_set_hart(hartid, &coldboot_wait_hmask);
>>>>
>>>> +       sbi_platform_ipi_clear(plat, hartid);
>>>> +
>>>>        /* Wait for coldboot to finish using WFI */
>>>>        while (!coldboot_done) {
>>>>                spin_unlock(&coldboot_lock);
>>>>
>>>> Anyway according to the RISC-V spec wfi may be implemented as a simple nop:
>>>>
>>>> "The purpose of the WFI instruction is to provide a hint to the
>>>> implementation, and so a legal implementation is to simply implement WFI
>>>> as a NOP."
>>>
>>> Yes, we know, that's not the problem here. The problem is that your
>>> hardware _actually thinks there's a pending software interrupt_, since
>>> the MSIP bit is set in MIP. An implementation that makes WFI a NOP is
>>> perfectly fine because, as I've said at least once already, the _inner_
>>> loop still checks MIP.MSIP before attempting to acquire the lock again.
>>>
>>> Jess
>>>
>>>>>> Please further investigate (or explain if you already know) exactly how
>>>>>> you end up in a deadlock situation between wake_coldboot_harts and
>>>>>> wait_for_coldboot, as I cannot currently see how it could ever be
>>>>>> possible.
>>>>>>
>>>>>>>>> Even if reading and writing coldboot_done is not atomic it is safe to check
>>>>>>>>> it without using a spinlock.
>>>>>>>>
>>>>>>>> Maybe microarchitecturally on the specific hardware in question (though
>>>>>>>> I wouldn't even be so certain about that), but definitely not in C. You
>>>>>>>> now have a data race in reading coldboot_done, which isn't even
>>>>>>>> volatile^, so a sufficiently smart compiler would be completely within
>>>>>>>> its right to do whatever it wants to the code, including things like
>>>>>>>> delete the loop entirely (or, slightly less dramatic, turn it into an
>>>>>>>> if). This patch introduces undefined behaviour.
>>>>>>>>
>>>>>>>> Jess
>>>>>>>>
>>>>>>>> ^ Please do not make it volatile as a workaround, that is not the right
>>>>>>>> way to "fix" such issues.
>>>>>>>
>>>>>>> What do you suggest Jessica?
>>>>>>
>>>>>> My primary suggestion is above, since this does not look like a
>>>>>> necessary fix.
>>>>>>
>>>>>> There is the separate question of whether this code should be using
>>>>>> spin locks anyway. One could optimise it with suitable use of proper
>>>>>> atomics to access coldboot_done, which would mask whatever issue you
>>>>>> are seeing, but that should not be necessary, and in early boot code
>>>>>> like this that's not on a hot code path it's better to keep things
>>>>>> simple like it is currently rather than risk writing unsafe code
>>>>>> through incorrect use of atomics. I would feel much more comfortable
>>>>>> leaving the code using spin locks.
>>>>>>
>>>>>> Jess
>>>>>>
>>>>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>>>>>>>> ---
>>>>>>>>> lib/sbi/sbi_init.c | 4 ++--
>>>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
>>>>>>>>> index a7fb848..a0e4f11 100644
>>>>>>>>> --- a/lib/sbi/sbi_init.c
>>>>>>>>> +++ b/lib/sbi/sbi_init.c
>>>>>>>>> @@ -106,14 +106,14 @@ static void wait_for_coldboot(struct sbi_scratch *scratch, u32 hartid)
>>>>>>>>>    sbi_hartmask_set_hart(hartid, &coldboot_wait_hmask);
>>>>>>>>>
>>>>>>>>>    /* Wait for coldboot to finish using WFI */
>>>>>>>>> +   spin_unlock(&coldboot_lock);
>>>>>>>>>    while (!coldboot_done) {
>>>>>>>>> -           spin_unlock(&coldboot_lock);
>>>>>>>>>            do {
>>>>>>>>>                    wfi();
>>>>>>>>>                    cmip = csr_read(CSR_MIP);
>>>>>>>>>             } while (!(cmip & MIP_MSIP));
>>>>>>>>> -           spin_lock(&coldboot_lock);
>>>>>>>>>    };
>>>>>>>>> +   spin_lock(&coldboot_lock);
>>>>>>>>>
>>>>>>>>>    /* Unmark current HART as waiting */
>>>>>>>>>    sbi_hartmask_clear_hart(hartid, &coldboot_wait_hmask);
>>>>>>>>> --
>>>>>>>>> 2.28.0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> opensbi mailing list
>>>>>>>>> opensbi at lists.infradead.org
>>>>>>>>> http://lists.infradead.org/mailman/listinfo/opensbi
>>>>>>
>>>>>>
>>>>>> --
>>>>>> opensbi mailing list
>>>>>> opensbi at lists.infradead.org
>>>>>> http://lists.infradead.org/mailman/listinfo/opensbi
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Atish
>>>
>>>
>>
>
>




More information about the opensbi mailing list