[PATCH] virtio_ring: Fix the stale index in available ring

Gavin Shan gshan at redhat.com
Fri Mar 15 04:24:36 PDT 2024


On 3/15/24 21:05, Michael S. Tsirkin wrote:
> On Fri, Mar 15, 2024 at 08:45:10PM +1000, Gavin Shan wrote:
>>>> Yes, I guess smp_wmb() ('dmb') is buggy on NVidia's grace-hopper platform. I tried
>> to reproduce it with my own driver where one thread writes to the shared buffer
>> and another thread reads from the buffer. I don't hit the out-of-order issue so
>> far.
> 
> Make sure the 2 areas you are accessing are in different cache lines.
> 

Yes, I already put those 2 areas to separate cache lines.

> 
>> My driver may be not correct somewhere and I will update if I can reproduce
>> the issue with my driver in the future.
> 
> Then maybe your change is just making virtio slower and masks the bug
> that is actually elsewhere?
> 
> You don't really need a driver. Here's a simple test: without barriers
> assertion will fail. With barriers it will not.
> (Warning: didn't bother testing too much, could be buggy.
> 
> ---
> 
> #include <pthread.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <assert.h>
> 
> #define FIRST values[0]
> #define SECOND values[64]
> 
> volatile int values[100] = {};
> 
> void* writer_thread(void* arg) {
> 	while (1) {
> 	FIRST++;
> 	// NEED smp_wmb here
         __asm__ volatile("dmb ishst" : : : "memory");
> 	SECOND++;
> 	}
> }
> 
> void* reader_thread(void* arg) {
>      while (1) {
> 	int first = FIRST;
> 	// NEED smp_rmb here
         __asm__ volatile("dmb ishld" : : : "memory");
> 	int second = SECOND;
> 	assert(first - second == 1 || first - second == 0);
>      }
> }
> 
> int main() {
>      pthread_t writer, reader;
> 
>      pthread_create(&writer, NULL, writer_thread, NULL);
>      pthread_create(&reader, NULL, reader_thread, NULL);
> 
>      pthread_join(writer, NULL);
>      pthread_join(reader, NULL);
> 
>      return 0;
> }
> 

Had a quick test on NVidia's grace-hopper and Ampere's CPUs. I hit
the assert on both of them. After replacing 'dmb' with 'dsb', I can
hit assert on both of them too. I need to look at the code closely.

[root at virt-mtcollins-02 test]# ./a
a: a.c:26: reader_thread: Assertion `first - second == 1 || first - second == 0' failed.
Aborted (core dumped)

[root at nvidia-grace-hopper-05 test]# ./a
a: a.c:26: reader_thread: Assertion `first - second == 1 || first - second == 0' failed.
Aborted (core dumped)

Thanks,
Gavin




More information about the linux-arm-kernel mailing list