[PATCH 1/2] qspinlock: Ensure writes are pushed out of core write buffer

Alexander Sverdlin alexander.sverdlin at nokia.com
Thu Jan 28 02:36:24 EST 2021


Hi!

On 27/01/2021 23:21, Will Deacon wrote:
> On Wed, Jan 27, 2021 at 09:01:08PM +0100, Alexander A Sverdlin wrote:
>> From: Alexander Sverdlin <alexander.sverdlin at nokia.com>
>>
>> Ensure writes are pushed out of core write buffer to prevent waiting code
>> on another cores from spinning longer than necessary.
>>
>> 6 threads running tight spinlock loop competing for the same lock
>> on 6 cores on MIPS/Octeon do 1000000 iterations...
>>
>> before the patch in:	4.3 sec
>> after the patch in:	1.2 sec
> If you only have 6 cores, I'm not sure qspinlock makes any sense...

That's my impression as well and I even proposed to revert qspinlocks for MIPS:
https://lore.kernel.org/linux-mips/20170610002644.8434-1-paul.burton@imgtec.com/T/#ma1506c80472129b2ac41554cc2d863c9a24518c0

>> Same 6-core Octeon machine:
>> sysbench --test=mutex --num-threads=64 --memory-scope=local run
>>
>> w/o patch:	1.53s
>> with patch:	1.28s
>>
>> This will also allow to remove the smp_wmb() in
>> arch/arm/include/asm/mcs_spinlock.h (was it actually addressing the same
>> issue?).
>>
>> Finally our internal quite diverse test suite of different IPC/network
>> aspects didn't detect any regressions on ARM/ARM64/x86_64.
>>
>> Signed-off-by: Alexander Sverdlin <alexander.sverdlin at nokia.com>
>> ---
>>  kernel/locking/mcs_spinlock.h | 5 +++++
>>  kernel/locking/qspinlock.c    | 6 ++++++
>>  2 files changed, 11 insertions(+)
>>
>> diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
>> index 5e10153..10e497a 100644
>> --- a/kernel/locking/mcs_spinlock.h
>> +++ b/kernel/locking/mcs_spinlock.h
>> @@ -89,6 +89,11 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node)
>>  		return;
>>  	}
>>  	WRITE_ONCE(prev->next, node);
>> +	/*
>> +	 * This is necessary to make sure that the corresponding "while" in the
>> +	 * mcs_spin_unlock() doesn't loop forever
>> +	 */
>> +	smp_wmb();
> If it loops forever, that's broken hardware design; store buffers need to
> drain. I don't think we should add unconditional barriers to bodge this.

The comment is a bit exaggerating the situation, but it's undeterministic and you see the
measurements above. Something is wrong in the qspinlocks code, please consider this patch
"RFC", but something has to be done here.

-- 
Best regards,
Alexander Sverdlin.



More information about the linux-arm-kernel mailing list