[PATCH] lib: sbi: Revert entry_count before doing hsm stop in hsm wait loop
Nick Hu
nick.hu at sifive.com
Fri May 30 00:05:18 PDT 2025
On Wed, May 28, 2025 at 10:34 PM Nick Hu <nick.hu at sifive.com> wrote:
>
> On Wed, May 28, 2025 at 8:14 PM Anup Patel <apatel at ventanamicro.com> wrote:
> >
> > On Wed, May 28, 2025 at 3:56 PM Nick Hu <nick.hu at sifive.com> wrote:
> > >
> > > On Tue, May 27, 2025 at 8:48 PM Anup Patel <apatel at ventanamicro.com> wrote:
> > > >
> > > > Using hsm stop in hsm wait loop causes secondary harts to be stuck
> > > > forever in OpenSBI on RISC-V platforms where HSM hart hotplug is
> > > > available and all harts come-up at the same time during system
> > > > power-on.
> > > >
> > > > For example, lets say we have two harts A and B on a RISC-V platform
> > > > with HSM hart hotplug which come-up at the same time during system
> > > > power-on. The hart A enters OpenSBI before hart B hence it becomes
> > > > the primary (or cold-boot) hart whereas hart B becomes the secondary
> > > > (or warm-boot) hart. The hart A follows the OpenSBI cold-boot path
> > > > and registers hsm device before hart B enters OpenSBI. The hart B
> > > > eventually enters OpenSBI and follows the OpenSBI warm-boot path
> > > > so it will increment it's own entry_count before entering hsm wait
> > > > loop where it sees hsm device and stops itself. Later as part of
> > > > the Linux boot-up sequence, hart A issues SBI HSM start call to
> > > > bring-up hart B but OpenSBI sees entry_count != init_count for
> > > > hart B in sbi_hsm_hart_start() hence hsm_device_hart_start() is
> > > > not called for hart B resulting in hart B stuck forever in OpenSBI.
> > > >
> > > > To fix the above issue, revert entry_count before doing hsm stop
> > > > in hsm wait loop.
> > > >
> > > > Fixes: d844deadec94 ("lib: sbi: Use hsm stop for hsm wait")
> > > > Signed-off-by: Anup Patel <apatel at ventanamicro.com>
> > > > ---
> > > > include/sbi/sbi_init.h | 2 ++
> > > > lib/sbi/sbi_hsm.c | 4 +++-
> > > > lib/sbi/sbi_init.c | 13 +++++++++++++
> > > > 3 files changed, 18 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/sbi/sbi_init.h b/include/sbi/sbi_init.h
> > > > index c9013ea4..ad068674 100644
> > > > --- a/include/sbi/sbi_init.h
> > > > +++ b/include/sbi/sbi_init.h
> > > > @@ -16,6 +16,8 @@ struct sbi_scratch;
> > > >
> > > > void __noreturn sbi_init(struct sbi_scratch *scratch);
> > > >
> > > > +void sbi_revert_entry_count(struct sbi_scratch *scratch);
> > > > +
> > > > unsigned long sbi_entry_count(u32 hartindex);
> > > >
> > > > unsigned long sbi_init_count(u32 hartindex);
> > > > diff --git a/lib/sbi/sbi_hsm.c b/lib/sbi/sbi_hsm.c
> > > > index e8128a39..557ab131 100644
> > > > --- a/lib/sbi/sbi_hsm.c
> > > > +++ b/lib/sbi/sbi_hsm.c
> > > > @@ -176,8 +176,10 @@ static void sbi_hsm_hart_wait(struct sbi_scratch *scratch)
> > > > * If the hsm_dev is ready and it support the hotplug, we can
> > > > * use the hsm stop for more power saving
> > > > */
> > > > - if (hsm_device_has_hart_hotplug())
> > > > + if (hsm_device_has_hart_hotplug()) {
> > > > + sbi_revert_entry_count(scratch);
> > > > hsm_device_hart_stop();
> > >
> > > Hi Anup,
> > > Thanks for addressing the bug =)
> > >
> > > However, there may still be an issue in the following scenario:
> > > If the HSM state of core B is SBI_HSM_STATE_START_PENDING at this
> > > point, it won't enter the warm boot path. As a result, the
> > > `init_count` will end up being `entry_count` + 1 after the warm init
> > > path.
> > >
> > > Would it make sense to increment `entry_count` after the
> > > `sbi_hsm_init()` call instead?
> >
> > This is not a bug rather expected behaviour.
> >
> > When the hart B is in SBI_HSM_STATE_START_PENDING state
> > then it means some other hart A has issued HSM hart start and
> > has also provided entry details. The hart B increments entry_count
> > upon entry into OpenSBI and increments init_count after initialization
> > is done.
> >
> I'm not sure if I understand it correctly.
> In the above case, the `init_count` != `entry_count` for hart B after
> the initialization is done.
> If the hart B enters the `hsm_dev->hart_stop()` via the
> SBI_EXT_HSM_HART_STOP path, Doesn't that mean the hart A won’t be able
> to call `hsm_dev->hart_start()` for the hart B?
> The `hsm_dev->hart_stop()` function may transition the hart into a
> platform-specific low-power state that requires
> `hsm_dev->hart_start()` to wake it up.
>
Oh, I see. I had mistaken `hsm_device_hart_stop()` for `sbi_hsm_hart_stop()`.
Since hsm_device_hart_stop() doesn't check the HSM state, hart B will
proceed to hsm_dev->hart_stop() and, upon waking up, will enter the
warm boot path.
As a result, `init_count` will match `entry_count` after hart B
completes initialization.
Apologies for the confusion. The fix looks good to me.
Reviewed-by: Nick Hu <nick.hu at sifive.com>
> Best Regards,
> Nick
>
> > Regards,
> > Anup
> >
> > >
> > > > + }
> > > >
> > > > wfi();
> > > > }
> > > > diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
> > > > index 62c32682..84a63748 100644
> > > > --- a/lib/sbi/sbi_init.c
> > > > +++ b/lib/sbi/sbi_init.c
> > > > @@ -579,6 +579,19 @@ void __noreturn sbi_init(struct sbi_scratch *scratch)
> > > > init_warmboot(scratch, hartid);
> > > > }
> > > >
> > > > +void sbi_revert_entry_count(struct sbi_scratch *scratch)
> > > > +{
> > > > + unsigned long *entry_count, *init_count;
> > > > +
> > > > + if (!entry_count_offset || !init_count_offset)
> > > > + sbi_hart_hang();
> > > > +
> > > > + entry_count = sbi_scratch_offset_ptr(scratch, entry_count_offset);
> > > > + init_count = sbi_scratch_offset_ptr(scratch, init_count_offset);
> > > > +
> > > > + *entry_count = *init_count;
> > > > +}
> > > > +
> > > > unsigned long sbi_entry_count(u32 hartindex)
> > > > {
> > > > struct sbi_scratch *scratch;
> > > > --
> > > > 2.43.0
> > > >
> > >
> > > Best Regards,
> > > Nick
More information about the opensbi
mailing list