[PATCH 1/2] lib: sbi_init: Avoid thundering hurd problem with coldbook_lock

Sun Aug 16 21:41:15 EDT 2020

On 17 Aug 2020, at 02:31, Bin Meng <bmeng.cn at gmail.com> wrote:
> 
> On Sun, Aug 16, 2020 at 10:13 PM Jessica Clarke <jrtc27 at jrtc27.com> wrote:
>> 
>> On 16 Aug 2020, at 15:02, Anup Patel <anup at brainfault.org> wrote:
>>> 
>>> On Sun, Aug 16, 2020 at 12:20 PM Bin Meng <bmeng.cn at gmail.com> wrote:
>>>> 
>>>> On Sun, Aug 16, 2020 at 3:49 AM Jessica Clarke <jrtc27 at jrtc27.com> wrote:
>>>>> 
>>>>> On 15 Aug 2020, at 18:00, Anup Patel <anup.patel at wdc.com> wrote:
>>>>>> 
>>>>>> We can have thundering hurd problem with coldboot_lock where the
>>>>>> boot HART can potentially starve trying to acquire coldboot_lock
>>>>>> because some of the non-boot HARTs are continuously acquiring and
>>>>>> releasing coldboot_lock. This can happen if WFI is a NOP
>>>>> 
>>>>> That is neither necessary nor sufficient, it's solely based on
>>>>> whether the hart believes an M-mode software interrupt is pending?
>>>>> 
>>>>>> OR if
>>>>>> MIP.MSIP bit is already set for some of the non-boot HARTs.
>>>>>> 
>>>>>> To avoid thundering hurd problem for coldboot_lock, we convert
>>>>>> coldboot_done flag into atomic variable and using coldboot_lock
>>>>>> only for coldboot_wait_hmask.
>>>>>> 
>>>>>> Signed-off-by: Anup Patel <anup.patel at wdc.com>
>>>>>> Tested-by: Heinrich Schuchardt <xypron.glpk at gmx.de>
>>>>>> ---
>>>>>> lib/sbi/sbi_init.c | 19 ++++++++++++-------
>>>>>> 1 file changed, 12 insertions(+), 7 deletions(-)
>>>>>> 
>>>>>> diff --git a/lib/sbi/sbi_init.c b/lib/sbi/sbi_init.c
>>>>>> index a7fb848..6b58983 100644
>>>>>> --- a/lib/sbi/sbi_init.c
>>>>>> +++ b/lib/sbi/sbi_init.c
>>>>>> @@ -85,9 +85,10 @@ static void sbi_boot_prints(struct sbi_scratch *scratch, u32 hartid)
>>>>>> }
>>>>>> 
>>>>>> static spinlock_t coldboot_lock = SPIN_LOCK_INITIALIZER;
>>>>>> -static unsigned long coldboot_done = 0;
>>>>>> static struct sbi_hartmask coldboot_wait_hmask = { 0 };
>>>>>> 
>>>>>> +static atomic_t coldboot_done = ATOMIC_INITIALIZER(0);
>>>>>> +
>>>>>> static void wait_for_coldboot(struct sbi_scratch *scratch, u32 hartid)
>>>>>> {
>>>>>>     unsigned long saved_mie, cmip;
>>>>>> @@ -105,16 +106,20 @@ static void wait_for_coldboot(struct sbi_scratch *scratch, u32 hartid)
>>>>>>     /* Mark current HART as waiting */
>>>>>>     sbi_hartmask_set_hart(hartid, &coldboot_wait_hmask);
>>>>>> 
>>>>>> +     /* Release coldboot lock */
>>>>>> +     spin_unlock(&coldboot_lock);
>>>>>> +
>>>>>>     /* Wait for coldboot to finish using WFI */
>>>>>> -     while (!coldboot_done) {
>>>>>> -             spin_unlock(&coldboot_lock);
>>>>>> +     while (!atomic_read(&coldboot_done)) {
>>>>> 
>>>>> Personally I'd make this a relaxed read and then explicitly fence
>>>>> outside the loop, as otherwise if we end up with MSIP erroneously set
>>>>> there may be a lot of cache coherency traffic due to repeated
>>>>> unnecessary fences?
>>>>> 
>>>>>>             do {
>>>>>>                     wfi();
>>>>>>                     cmip = csr_read(CSR_MIP);
>>>>>>              } while (!(cmip & MIP_MSIP));
>>>>>> -             spin_lock(&coldboot_lock);
>>>>>>     };
>>>>>> 
>>>>>> +     /* Acquire coldboot lock */
>>>>>> +     spin_lock(&coldboot_lock);
>>>>>> +
>>>>>>     /* Unmark current HART as waiting */
>>>>>>     sbi_hartmask_clear_hart(hartid, &coldboot_wait_hmask);
>>>>>> 
>>>>>> @@ -132,12 +137,12 @@ static void wake_coldboot_harts(struct sbi_scratch *scratch, u32 hartid)
>>>>>> {
>>>>>>     const struct sbi_platform *plat = sbi_platform_ptr(scratch);
>>>>>> 
>>>>>> +     /* Mark coldboot done */
>>>>>> +     atomic_write(&coldboot_done, 1);
>>>>> 
>>>>> This only needs to be a store release.
>>>> 
>>>> I believe relaxed read / write can work as well.
>>> 
>>> Even if we use relaxed read / write, we will still need explicit Acquire
>>> and Release barriers. Better to use __smp_store_release() and
>>> __smp_load_acquire().
>> 
>> Why not just use the C11 versions? Inline assembly is rarely needed
>> these days for atomics. Especially helpful given the large hammer that
>> is volatile inline assembly with a memory clobber causes highly
>> pessimistic code generation from the compiler. In general it should
>> only be necessary when needing to deal with I/O memory.
> 
> For clarification, are you suggesting use the C11 compiler atomic
> intrinsics, or using C11 atomic library (libatomic)?
> 
> If using the C11 atomic library, my understanding is that both OpenSBI
> and Linux kernel want to be self-contained, and do not want to link
> with external libraries. Like they implement their own sub-set of libc
> functions instead of glibc or something else.

The language merely defines the atomics. Some implementations (read:
GCC if you use byte or half-word atomics on RISC-V, but not LLVM
because it's perfectly capable of inlining the tiny little masked LR/SC
loops) require linking against libatomic under certain circumstances,
much like depending on your arch string multiplication and division can
require libcalls to things like __mulsi3 in libgcc/compiler-rt. I would
not advocate for using atomics that require libcalls, i.e. anything
that would need libatomic when using GCC, since they're rather
inefficient as a result (though I _would_ encourage someone to finally
fix GCC so it's less stupid and follows LLVM's lead of just generating
the short sequences), but if someone wanted those OpenSBI could still
provide its own implementations of the libcalls just like kernels do
with the libgcc/compiler-rt functions.

So in short: the former, though you can achieve the latter whilst still
being self-contained, if desired.

Jess