[PATCH 07/18] arm64: locks: patch in lse instructions when supported by the CPU
Catalin Marinas
catalin.marinas at arm.com
Thu Jul 23 07:14:23 PDT 2015
On Thu, Jul 23, 2015 at 02:39:35PM +0100, Will Deacon wrote:
> On Tue, Jul 21, 2015 at 06:29:18PM +0100, Will Deacon wrote:
> > On Tue, Jul 21, 2015 at 05:53:39PM +0100, Catalin Marinas wrote:
> > > On Mon, Jul 13, 2015 at 10:25:08AM +0100, Will Deacon wrote:
> > > > @@ -125,11 +155,19 @@ static inline void arch_write_lock(arch_rwlock_t *rw)
> > > >
> > > > asm volatile(
> > > > " sevl\n"
> > > > + ARM64_LSE_ATOMIC_INSN(
> > > > + /* LL/SC */
> > > > "1: wfe\n"
> > > > "2: ldaxr %w0, %1\n"
> > > > " cbnz %w0, 1b\n"
> > > > " stxr %w0, %w2, %1\n"
> > > > - " cbnz %w0, 2b\n"
> > > > + " cbnz %w0, 2b",
> > > > + /* LSE atomics */
> > > > + "1: wfe\n"
> > > > + " mov %w0, wzr\n"
> > > > + " casa %w0, %w2, %1\n"
> > > > + " nop\n"
> > > > + " cbnz %w0, 1b")
> > > > : "=&r" (tmp), "+Q" (rw->lock)
> > > > : "r" (0x80000000)
> > > > : "memory");
> > >
> > > With WFE in the LL/SC case, we rely on LDAXR to set the exclusive
> > > monitor and an event would be generated every time it gets cleared. With
> > > CAS, we no longer have this behaviour, so what guarantees a SEV?
> >
> > My understanding was that failed CAS will set the exclusive monitor, but
> > what I have for a spec doesn't actually comment on this behaviour. I'll
> > go digging...
>
> ... and the winner is: not me! We do need an LDXR to set the exclusive
> monitor and doing that without introducing races is slightly confusing.
>
> Here's what I now have for write_lock (read_lock is actually pretty simple):
>
> static inline void arch_write_lock(arch_rwlock_t *rw)
> {
> unsigned int tmp;
>
> asm volatile(ARM64_LSE_ATOMIC_INSN(
> /* LL/SC */
> " sevl\n"
> "1: wfe\n"
> "2: ldaxr %w0, %1\n"
> " cbnz %w0, 1b\n"
> " stxr %w0, %w2, %1\n"
> " cbnz %w0, 2b\n"
> " nop",
> /* LSE atomics */
> "1: mov %w0, wzr\n"
> "2: casa %w0, %w2, %1\n"
> " cbz %w0, 3f\n"
> " ldxr %w0, %1\n"
> " cbz %w0, 2b\n"
> " wfe\n"
> " b 1b\n"
> "3:")
> : "=&r" (tmp), "+Q" (rw->lock)
> : "r" (0x80000000)
> : "memory");
> }
>
> What do you reckon?
It looks fine. I thought I could reduce the number of branches but I
still end up with 3. At least the no-contention case should be fast.
--
Catalin
More information about the linux-arm-kernel
mailing list