[PATCH v2] tests/nvme: Add admin-passthru+reset race test

Jonathan Derrick jonathan.derrick at linux.dev
Mon Nov 21 14:34:44 PST 2022



On 11/21/2022 1:55 PM, Keith Busch wrote:
> On Thu, Nov 17, 2022 at 02:22:10PM -0700, Jonathan Derrick wrote:
>> I seem to have isolated the error mechanism for older kernels, but 6.2.0-rc2
>> reliably segfaults my QEMU instance (something else to look into) and I don't
>> have any 'real' hardware to test this on at the moment. It looks like several
>> passthru commands are able to enqueue prior/during/after resetting/connecting.
> 
> I'm not seeing any problem with the latest nvme-qemu after several dozen
> iterations of this test case. In that environment, the formats and
> resets complete practically synchronously with the call, so everything
> proceeds quickly. Is there anything special I need to change?
>  
I can still repro this with nvme-fixes tag, so I'll have to dig into it myself
Does the tighter loop in the test comment header produce results?


>> The issue seems to be very heavily timing related, so the loop in the header is
>> a lot more forceful in this approach.
>>
>> As far as the loop goes, I've noticed it will typically repro immediately or
>> pass the whole test.
> 
> I can only get possible repro in scenarios that have multi-second long,
> serialized format times. Even then, it still appears that everything
> fixes itself after a waiting. Are you observing the same, or is it stuck
> forever in your observations?
In 5.19, it gets stuck forever with lots of formats outstanding and
controller stuck in resetting. I'll keep digging. Thanks Keith

> 
>> +remove_and_rescan() {
>> +	local pdev=$1
>> +	echo 1 > /sys/bus/pci/devices/"$pdev"/remove
>> +	echo 1 > /sys/bus/pci/rescan
>> +}
> 
> This function isn't called anywhere.



More information about the Linux-nvme mailing list