[PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

Wed Mar 31 05:18:56 BST 2021

On Tue, Mar 30, 2021 at 3:12 PM Arnd Bergmann <arnd at arndb.de> wrote:
>
> On Tue, Mar 30, 2021 at 4:26 AM Guo Ren <guoren at kernel.org> wrote:
> > On Mon, Mar 29, 2021 at 9:56 PM Arnd Bergmann <arnd at arndb.de> wrote:
> > > On Mon, Mar 29, 2021 at 2:52 PM Guo Ren <guoren at kernel.org> wrote:
> > > > On Mon, Mar 29, 2021 at 7:31 PM Peter Zijlstra <peterz at infradead.org> wrote:
> > > > >
> > > > > What's the architectural guarantee on LL/SC progress for RISC-V ?
> > >
> > >    "When LR/SC is used for memory locations marked RsrvNonEventual,
> > >      software should provide alternative fall-back mechanisms used when
> > >      lack of progress is detected."
> > >
> > > My reading of this is that if the example you tried stalls, then either
> > > the PMA is not RsrvEventual, and it is wrong to rely on ll/sc on this,
> > > or that the PMA is marked RsrvEventual but the implementation is
> > > buggy.
> >
> > Yes, PMA just defines physical memory region attributes, But in our
> > processor, when MMU is enabled (satp's value register > 2) in s-mode,
> > it will look at our custom PTE's attributes BIT(63) ref [1]:
> >
> >    PTE format:
> >    | 63 | 62 | 61 | 60 | 59 | 58-8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0
> >      SO   C    B    SH   SE    RSW   D   A   G   U   X   W   R   V
> >      ^    ^    ^    ^    ^
> >    BIT(63): SO - Strong Order
> >    BIT(62): C  - Cacheable
> >    BIT(61): B  - Bufferable
> >    BIT(60): SH - Shareable
> >    BIT(59): SE - Security
> >
> > So the memory also could be RsrvNone/RsrvEventual.
>
> I was not talking about RsrvNone, which would clearly mean that
> you cannot use lr/sc at all (trap would trap, right?), but "RsrvNonEventual",
> which would explain the behavior you described in an earlier reply:
>
> | u32 a = 0x55aa66bb;
> | u16 *ptr = &a;
> |
> | CPU0                       CPU1
> | =========             =========
> | xchg16(ptr, new)     while(1)
> |                                     WRITE_ONCE(*(ptr + 1), x);
> |
> | When we use lr.w/sc.w implement xchg16, it'll cause CPU0 deadlock.
>
> As I understand, this example must not cause a deadlock on
> a compliant hardware implementation when the underlying memory
> has RsrvEventual behavior, but could deadlock in case of
> RsrvNonEventual
Thx for the nice explanation:
 - RsrvNonEventual - depends on software fall-back mechanisms, and
just I'm worried about.
 - RsrvEventual - HW would provide the eventual success guarantee.

>
> > [1] https://github.com/c-sky/csky-linux/commit/e837aad23148542771794d8a2fcc52afd0fcbf88
> >
> > >
> > > It also seems that the current "amoswap" based implementation
> > > would be reliable independent of RsrvEventual/RsrvNonEventual.
> >
> > Yes, the hardware implementation of AMO could be different from LR/SC.
> > AMO could use ACE snoop holding to lock the bus in hw coherency
> > design, but LR/SC uses an exclusive monitor without locking the bus.
> >
> > RISC-V hasn't CAS instructions, and it uses LR/SC for cmpxchg. I don't
> > think LR/SC would be slower than CAS, and CAS is just good for code
> > size.
>
> What I meant here is that the current spinlock uses a simple amoswap,
> which presumably does not suffer from the lack of forward process you
> described.
Does that mean we should prevent using LR/SC (if RsrvNonEventual)?

-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/