[PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath
Waiman Long
waiman.long at hp.com
Tue Jul 7 12:28:13 PDT 2015
On 07/07/2015 02:19 PM, Will Deacon wrote:
> On Tue, Jul 07, 2015 at 06:51:54PM +0100, Waiman Long wrote:
>> On 07/07/2015 01:24 PM, Will Deacon wrote:
>>> When a slow-path reader gets to the front of the wait queue outside of
>>> interrupt context, it waits for any writers to drain, increments the
>>> reader count and again waits for any additional writers that may have
>>> snuck in between the initial check and the increment.
>>>
>>> Given that this second check is performed with acquire semantics, there
>>> is no need to perform the increment using atomic_add_return, which acts
>>> as a full barrier.
>>>
>>> This patch changes the slow-path code to use smp_load_acquire and
>>> atomic_add instead of atomic_add_return. Since the check only involves
>>> the writer count, we can perform the acquire after the add.
>>>
>>> Signed-off-by: Will Deacon<will.deacon at arm.com>
>>> ---
>>> kernel/locking/qrwlock.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
>>> index 96b77d1e0545..4e29bef688ac 100644
>>> --- a/kernel/locking/qrwlock.c
>>> +++ b/kernel/locking/qrwlock.c
>>> @@ -98,7 +98,8 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
>>> while (atomic_read(&lock->cnts)& _QW_WMASK)
>>> cpu_relax_lowlatency();
>>>
>>> - cnts = atomic_add_return(_QR_BIAS,&lock->cnts) - _QR_BIAS;
>>> + atomic_add(_QR_BIAS,&lock->cnts);
>>> + cnts = smp_load_acquire((u32 *)&lock->cnts);
>>> rspin_until_writer_unlock(lock, cnts);
>>>
>>> /*
>> Atomic add in x86 is actually a full barrier too. The performance
>> difference between "lock add" and "lock xadd" should be minor. The
>> additional load, however, could potentially cause an additional
>> cacheline load on a contended lock. So do you see actual performance
>> benefit of this change in ARM?
> I'd need to re-run the numbers, but atomic_add is significantly less
> work on ARM than atomic_add_return, which basically has two full memory
> barriers compared to none for the former.
>
> Will
I think a compromise is to encapsulate that in an inlined function that
can be overridden by architecture specific code. For example,
#ifndef queued_inc_reader_return
static inline u32 queued_inc_reader_return(struct qrwlock *lock)
{
return atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS;
}
#endif
:
cnts = queued_inc_reader_return(lock);
This also means that you will need to keep the function prototype for
rspin_until_writer_unlock() in the later patch. Other than that, I don't
see any other issue in the patch series.
BTW, are you also planning to use qspinlock in the ARM64 code?
Cheers,
Longman
More information about the linux-arm-kernel
mailing list