[PATCH 1/1] lib: sbi: relax scoldboot_lock spinlock

Atish Patra atishp at atishpatra.org
Thu Aug 13 18:25:14 EDT 2020


On Thu, Aug 13, 2020 at 2:34 PM Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
>
> On 8/13/20 11:14 PM, Atish Patra wrote:
> > On Thu, Aug 13, 2020 at 1:04 PM Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
> >>
> >> On 13.08.20 21:16, Jessica Clarke wrote:
> >>> On 13 Aug 2020, at 19:58, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
> >>>> On 13.08.20 19:45, Atish Patra wrote:
> >>>>> On Thu, Aug 13, 2020 at 10:03 AM Jessica Clarke <jrtc27 at jrtc27.com> wrote:
> >>>>>>
> >>>>>> On 13 Aug 2020, at 17:51, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
> >>>>>>> On 13.08.20 18:36, Jessica Clarke wrote:
> >>>>>>>> On 13 Aug 2020, at 17:24, Heinrich Schuchardt <xypron.glpk at gmx.de> wrote:
> >>>>>>>>>
> >>>>>>>>> OpenSBI hangs in wake_coldboot_harts() indefinitely on the MaixDuino board
> >>>>>>>>> when trying to boot U-Boot as FW_PAYLOAD.
> >>>>>>>>
> >>>>>>>> This smells of there being a more fundamental issue that's not being
> >>>>>>>> addressed which this manages to work around, either reliably or by
> >>>>>>>> chance. What is the actual deadlock condition you observe?
> >>>>>>>
> >>>>>>> There are only two function using coldboot_lock spinlock:
> >>>>>>>
> >>>>>>> * wake_coldboot_harts()
> >>>>>>> * wait_for_coldboot()
> >>>>>>>
> >>>>>>> wake_coldboot_harts() was entered but never left without patching one of
> >>>>>>> the functions.
> >>>>>>>
> >>>>>>> The Kendryte K210 has two harts. One goes the cold boot path while the
> >>>>>>> other is sent to the warm boot path.
> >>>>>>
> >>>>>> The K210's platform definition uses clint_ipi_send, which as far as I
> >>>>>> can tell is entirely non-blocking. The only thing wake_coldboot_harts
> >>>>>> does with the lock held is set coldboot_done to 1 and send an IPI to
> >>>>>> all the other harts. Thus wake_coldboot_harts should never be able to
> >>>>>> block holding the spin lock, meaning wait_for_coldboot should
> >>>>>> eventually wake up and be able to acquire the lock (on platforms that
> >>>>>> implement WFI fully it shouldn't have to wait long for the lock, though
> >>>>>> on platforms that make it a NOP there might be a bit of .
> >>>>>>
> >>>>>
> >>>>> Yes. If wake_coldboot_harts is never able to acquire the spinlock,
> >>>>> that indicates spinlock
> >>>>> is always acquired by wait_for_coldboot. That can happen if
> >>>>>
> >>>>> 1. WFI is implemented as NOP as suggested by Jessica and MSIP in MIP
> >>>>> is set which
> >>>>> shouldn't happen unless there is an IPI sent.
> >>>>>
> >>>>> Can you try clearing the MIP just before acquiring the lock ? In qemu
> >>>>> & unleashed, modifying the MSIP bit in
> >>>>> MIP doesn't actually do anything. So we do sbi_platform_ipi_clear  to reset it.
> >>>>
> >>>> The following does *not* solve the problem:
> >>>
> >>> What if you do it before enabling MSIP in MIE? If that doesn't work it
> >>> sounds like something is very broken with that hardware.
> >>
> >> This *fails* too:
> >>
> >
> > Probably, the hart executing warmboot invokes sbi_platform_ipi_clear
> > even before sbi_ipi_init is being called from the hart running coldboot.
>
> There are two calls to sbi_ipi_init(). Is only the one in coldboot relevant?
>

Yes. That actually setups the clint base address.

> Do you mean something like:
>

No. I meant like this.

+#define UART_BASE 0x38000000ULL
+static void sbi_print_early(char ch) {
+    int32_t r;
+    do {
+      __asm__ __volatile__ (
+        "amoor.w %0, %2, %1\n"
+        : "=r" (r), "+A" (*(uint32_t*)(UART_BASE))
+        : "r" (ch));
+    } while (r < 0);
+}
+
 static void sbi_boot_prints(struct sbi_scratch *scratch, u32 hartid)
 {
        int xlen;
@@ -112,6 +123,8 @@ static void wait_for_coldboot(struct sbi_scratch
*scratch, u32 hartid)
                        wfi();
                        cmip = csr_read(CSR_MIP);
                 } while (!(cmip & MIP_MSIP));
+               sbi_print_early('T');
+               sbi_printf("mip value %lx\n", cmip);
                spin_lock(&coldboot_lock);
        };


Note: sbi_printf may not work if sbi_console_init from the cold boot
path is not invoked.
That's why I added a "sbi_print_early".  If sbi_printf doesn't work,
please remove it.

Thanks for your time in figuring out the issue with Kendryte.
Unfortunately, I can't test these changes as I don't have
a kendryte board at home.


> diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
> index a7fb848..b062ce2 100644
> --- a/lib/sbi/sbi_init.c
> +++ b/lib/sbi/sbi_init.c
> @@ -33,6 +33,8 @@
>         "        | |\n"                                     \
>         "        |_|\n\n"
>
> +static char first = 0x20;
> +
>  static void sbi_boot_prints(struct sbi_scratch *scratch, u32 hartid)
>  {
>         int xlen;
> @@ -80,6 +82,8 @@ static void sbi_boot_prints(struct sbi_scratch
> *scratch, u32 hartid)
>                    sbi_ecall_version_major(), sbi_ecall_version_minor());
>         sbi_printf("\n");
>
> +       sbi_printf("First %c\n", first);
> +
>         sbi_hart_delegation_dump(scratch);
>         sbi_hart_pmp_dump(scratch);
>  }
> @@ -93,6 +97,9 @@ static void wait_for_coldboot(struct sbi_scratch
> *scratch, u32 hartid)
>         unsigned long saved_mie, cmip;
>         const struct sbi_platform *plat = sbi_platform_ptr(scratch);
>
> +       if (!first)
> +               first = 'W';
> +
>         /* Save MIE CSR */
>         saved_mie = csr_read(CSR_MIE);
>
> @@ -187,6 +194,8 @@ static void __noreturn init_coldboot(struct
> sbi_scratch *scratch, u32 hartid)
>         if (rc)
>                 sbi_hart_hang();
>
> +       if (!first)
> +               first = 'C';
>         rc = sbi_ipi_init(scratch, TRUE);
>         if (rc)
>                 sbi_hart_hang();
> @@ -247,6 +256,8 @@ static void __noreturn init_warmboot(struct
> sbi_scratch *scratch, u32 hartid)
>         if (rc)
>                 sbi_hart_hang();
>
> +       if (!first)
> +               first = 'w';
>         rc = sbi_ipi_init(scratch, FALSE);
>         if (rc)
>                 sbi_hart_hang();
>
> Best regards
>
> Heinrich
>
> >
> > Can we confirm the MSIP being set theory first by adding a long
> > outside the inner while loop
> > before acquiring the lock ? I don't see any other reason why the inner
> > loop will keep acquiring the spinlock
> > unless we are hitting some nasty spinlock bug with Kendryte.
> >
> > You may need to directly write MIP value to the uart address
> > (0x38000000) if sbi_console_init is not invoked yet.
> >
> >> diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
> >> index a7fb848..a29de51 100644
> >> --- a/lib/sbi/sbi_init.c
> >> +++ b/lib/sbi/sbi_init.c
> >> @@ -96,6 +96,8 @@ static void wait_for_coldboot(struct sbi_scratch
> >> *scratch, u32 hartid)
> >>         /* Save MIE CSR */
> >>         saved_mie = csr_read(CSR_MIE);
> >>
> >> +       sbi_platform_ipi_clear(plat, hartid);
> >> +
> >>         /* Set MSIE bit to receive IPI */
> >>         csr_set(CSR_MIE, MIP_MSIP);
> >>
> >> Sean mentioned that the Kendryte K210 implements RISC-V Privileged
> >> Architectures Specification 1.9.1. But I don't see anything in the
> >> change log of the spec that would be relevant here.
> >>
> >
> > Yeah. I checked 1.9.1 as well. I couldn't find anything weird that can
> > cause this issue.
> >
> >> To my knowledge the Kendryte K210 systems are the only really affordable
> >> 64bit RISC-V boards up to now. So it would be great to get them properly
> >> running in OpenSBI even if they deviate from the most current RISC-V specs.
> >>
> >> Best regards
> >>
> >> Heinrich
> >>
> >>>
> >>>> diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
> >>>> index a7fb848..a3111d3 100644
> >>>> --- a/lib/sbi/sbi_init.c
> >>>> +++ b/lib/sbi/sbi_init.c
> >>>> @@ -105,6 +105,8 @@ static void wait_for_coldboot(struct sbi_scratch
> >>>> *scratch, u32 hartid)
> >>>>        /* Mark current HART as waiting */
> >>>>        sbi_hartmask_set_hart(hartid, &coldboot_wait_hmask);
> >>>>
> >>>> +       sbi_platform_ipi_clear(plat, hartid);
> >>>> +
> >>>>        /* Wait for coldboot to finish using WFI */
> >>>>        while (!coldboot_done) {
> >>>>                spin_unlock(&coldboot_lock);
> >>>>
> >>>> Anyway according to the RISC-V spec wfi may be implemented as a simple nop:
> >>>>
> >>>> "The purpose of the WFI instruction is to provide a hint to the
> >>>> implementation, and so a legal implementation is to simply implement WFI
> >>>> as a NOP."
> >>>
> >>> Yes, we know, that's not the problem here. The problem is that your
> >>> hardware _actually thinks there's a pending software interrupt_, since
> >>> the MSIP bit is set in MIP. An implementation that makes WFI a NOP is
> >>> perfectly fine because, as I've said at least once already, the _inner_
> >>> loop still checks MIP.MSIP before attempting to acquire the lock again.
> >>>
> >>> Jess
> >>>
> >>>>>> Please further investigate (or explain if you already know) exactly how
> >>>>>> you end up in a deadlock situation between wake_coldboot_harts and
> >>>>>> wait_for_coldboot, as I cannot currently see how it could ever be
> >>>>>> possible.
> >>>>>>
> >>>>>>>>> Even if reading and writing coldboot_done is not atomic it is safe to check
> >>>>>>>>> it without using a spinlock.
> >>>>>>>>
> >>>>>>>> Maybe microarchitecturally on the specific hardware in question (though
> >>>>>>>> I wouldn't even be so certain about that), but definitely not in C. You
> >>>>>>>> now have a data race in reading coldboot_done, which isn't even
> >>>>>>>> volatile^, so a sufficiently smart compiler would be completely within
> >>>>>>>> its right to do whatever it wants to the code, including things like
> >>>>>>>> delete the loop entirely (or, slightly less dramatic, turn it into an
> >>>>>>>> if). This patch introduces undefined behaviour.
> >>>>>>>>
> >>>>>>>> Jess
> >>>>>>>>
> >>>>>>>> ^ Please do not make it volatile as a workaround, that is not the right
> >>>>>>>> way to "fix" such issues.
> >>>>>>>
> >>>>>>> What do you suggest Jessica?
> >>>>>>
> >>>>>> My primary suggestion is above, since this does not look like a
> >>>>>> necessary fix.
> >>>>>>
> >>>>>> There is the separate question of whether this code should be using
> >>>>>> spin locks anyway. One could optimise it with suitable use of proper
> >>>>>> atomics to access coldboot_done, which would mask whatever issue you
> >>>>>> are seeing, but that should not be necessary, and in early boot code
> >>>>>> like this that's not on a hot code path it's better to keep things
> >>>>>> simple like it is currently rather than risk writing unsafe code
> >>>>>> through incorrect use of atomics. I would feel much more comfortable
> >>>>>> leaving the code using spin locks.
> >>>>>>
> >>>>>> Jess
> >>>>>>
> >>>>>>>>> Signed-off-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
> >>>>>>>>> ---
> >>>>>>>>> lib/sbi/sbi_init.c | 4 ++--
> >>>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
> >>>>>>>>> index a7fb848..a0e4f11 100644
> >>>>>>>>> --- a/lib/sbi/sbi_init.c
> >>>>>>>>> +++ b/lib/sbi/sbi_init.c
> >>>>>>>>> @@ -106,14 +106,14 @@ static void wait_for_coldboot(struct sbi_scratch *scratch, u32 hartid)
> >>>>>>>>>    sbi_hartmask_set_hart(hartid, &coldboot_wait_hmask);
> >>>>>>>>>
> >>>>>>>>>    /* Wait for coldboot to finish using WFI */
> >>>>>>>>> +   spin_unlock(&coldboot_lock);
> >>>>>>>>>    while (!coldboot_done) {
> >>>>>>>>> -           spin_unlock(&coldboot_lock);
> >>>>>>>>>            do {
> >>>>>>>>>                    wfi();
> >>>>>>>>>                    cmip = csr_read(CSR_MIP);
> >>>>>>>>>             } while (!(cmip & MIP_MSIP));
> >>>>>>>>> -           spin_lock(&coldboot_lock);
> >>>>>>>>>    };
> >>>>>>>>> +   spin_lock(&coldboot_lock);
> >>>>>>>>>
> >>>>>>>>>    /* Unmark current HART as waiting */
> >>>>>>>>>    sbi_hartmask_clear_hart(hartid, &coldboot_wait_hmask);
> >>>>>>>>> --
> >>>>>>>>> 2.28.0
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> opensbi mailing list
> >>>>>>>>> opensbi at lists.infradead.org
> >>>>>>>>> http://lists.infradead.org/mailman/listinfo/opensbi
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> opensbi mailing list
> >>>>>> opensbi at lists.infradead.org
> >>>>>> http://lists.infradead.org/mailman/listinfo/opensbi
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Regards,
> >>>>> Atish
> >>>
> >>>
> >>
> >
> >
>


--
Regards,
Atish



More information about the opensbi mailing list