[PATCH] lib: sbi: Revert entry_count before doing hsm stop in hsm wait loop
Nick Hu
nick.hu at sifive.com
Wed May 28 07:34:23 PDT 2025
On Wed, May 28, 2025 at 8:14 PM Anup Patel <apatel at ventanamicro.com> wrote:
>
> On Wed, May 28, 2025 at 3:56 PM Nick Hu <nick.hu at sifive.com> wrote:
> >
> > On Tue, May 27, 2025 at 8:48 PM Anup Patel <apatel at ventanamicro.com> wrote:
> > >
> > > Using hsm stop in hsm wait loop causes secondary harts to be stuck
> > > forever in OpenSBI on RISC-V platforms where HSM hart hotplug is
> > > available and all harts come-up at the same time during system
> > > power-on.
> > >
> > > For example, lets say we have two harts A and B on a RISC-V platform
> > > with HSM hart hotplug which come-up at the same time during system
> > > power-on. The hart A enters OpenSBI before hart B hence it becomes
> > > the primary (or cold-boot) hart whereas hart B becomes the secondary
> > > (or warm-boot) hart. The hart A follows the OpenSBI cold-boot path
> > > and registers hsm device before hart B enters OpenSBI. The hart B
> > > eventually enters OpenSBI and follows the OpenSBI warm-boot path
> > > so it will increment it's own entry_count before entering hsm wait
> > > loop where it sees hsm device and stops itself. Later as part of
> > > the Linux boot-up sequence, hart A issues SBI HSM start call to
> > > bring-up hart B but OpenSBI sees entry_count != init_count for
> > > hart B in sbi_hsm_hart_start() hence hsm_device_hart_start() is
> > > not called for hart B resulting in hart B stuck forever in OpenSBI.
> > >
> > > To fix the above issue, revert entry_count before doing hsm stop
> > > in hsm wait loop.
> > >
> > > Fixes: d844deadec94 ("lib: sbi: Use hsm stop for hsm wait")
> > > Signed-off-by: Anup Patel <apatel at ventanamicro.com>
> > > ---
> > > include/sbi/sbi_init.h | 2 ++
> > > lib/sbi/sbi_hsm.c | 4 +++-
> > > lib/sbi/sbi_init.c | 13 +++++++++++++
> > > 3 files changed, 18 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/sbi/sbi_init.h b/include/sbi/sbi_init.h
> > > index c9013ea4..ad068674 100644
> > > --- a/include/sbi/sbi_init.h
> > > +++ b/include/sbi/sbi_init.h
> > > @@ -16,6 +16,8 @@ struct sbi_scratch;
> > >
> > > void __noreturn sbi_init(struct sbi_scratch *scratch);
> > >
> > > +void sbi_revert_entry_count(struct sbi_scratch *scratch);
> > > +
> > > unsigned long sbi_entry_count(u32 hartindex);
> > >
> > > unsigned long sbi_init_count(u32 hartindex);
> > > diff --git a/lib/sbi/sbi_hsm.c b/lib/sbi/sbi_hsm.c
> > > index e8128a39..557ab131 100644
> > > --- a/lib/sbi/sbi_hsm.c
> > > +++ b/lib/sbi/sbi_hsm.c
> > > @@ -176,8 +176,10 @@ static void sbi_hsm_hart_wait(struct sbi_scratch *scratch)
> > > * If the hsm_dev is ready and it support the hotplug, we can
> > > * use the hsm stop for more power saving
> > > */
> > > - if (hsm_device_has_hart_hotplug())
> > > + if (hsm_device_has_hart_hotplug()) {
> > > + sbi_revert_entry_count(scratch);
> > > hsm_device_hart_stop();
> >
> > Hi Anup,
> > Thanks for addressing the bug =)
> >
> > However, there may still be an issue in the following scenario:
> > If the HSM state of core B is SBI_HSM_STATE_START_PENDING at this
> > point, it won't enter the warm boot path. As a result, the
> > `init_count` will end up being `entry_count` + 1 after the warm init
> > path.
> >
> > Would it make sense to increment `entry_count` after the
> > `sbi_hsm_init()` call instead?
>
> This is not a bug rather expected behaviour.
>
> When the hart B is in SBI_HSM_STATE_START_PENDING state
> then it means some other hart A has issued HSM hart start and
> has also provided entry details. The hart B increments entry_count
> upon entry into OpenSBI and increments init_count after initialization
> is done.
>
I'm not sure if I understand it correctly.
In the above case, the `init_count` != `entry_count` for hart B after
the initialization is done.
If the hart B enters the `hsm_dev->hart_stop()` via the
SBI_EXT_HSM_HART_STOP path, Doesn't that mean the hart A won’t be able
to call `hsm_dev->hart_start()` for the hart B?
The `hsm_dev->hart_stop()` function may transition the hart into a
platform-specific low-power state that requires
`hsm_dev->hart_start()` to wake it up.
Best Regards,
Nick
> Regards,
> Anup
>
> >
> > > + }
> > >
> > > wfi();
> > > }
> > > diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
> > > index 62c32682..84a63748 100644
> > > --- a/lib/sbi/sbi_init.c
> > > +++ b/lib/sbi/sbi_init.c
> > > @@ -579,6 +579,19 @@ void __noreturn sbi_init(struct sbi_scratch *scratch)
> > > init_warmboot(scratch, hartid);
> > > }
> > >
> > > +void sbi_revert_entry_count(struct sbi_scratch *scratch)
> > > +{
> > > + unsigned long *entry_count, *init_count;
> > > +
> > > + if (!entry_count_offset || !init_count_offset)
> > > + sbi_hart_hang();
> > > +
> > > + entry_count = sbi_scratch_offset_ptr(scratch, entry_count_offset);
> > > + init_count = sbi_scratch_offset_ptr(scratch, init_count_offset);
> > > +
> > > + *entry_count = *init_count;
> > > +}
> > > +
> > > unsigned long sbi_entry_count(u32 hartindex)
> > > {
> > > struct sbi_scratch *scratch;
> > > --
> > > 2.43.0
> > >
> >
> > Best Regards,
> > Nick
More information about the opensbi
mailing list