[PATCHv2 1/3] block: introduce rq_list_for_each_safe macro

Max Gurtovoy mgurtovoy at nvidia.com
Thu Jan 6 03:54:28 PST 2022


On 1/5/2022 7:26 PM, Keith Busch wrote:
> On Tue, Jan 04, 2022 at 02:15:58PM +0200, Max Gurtovoy wrote:
>> This patch worked for me with 2 namespaces for NVMe PCI.
>>
>> I'll check it later on with my RDMA queue_rqs patches as well. There we have
>> also a tagset sharing with the connect_q (and not only with multiple
>> namespaces).
>>
>> But the connect_q is using a reserved tags only (for the connect commands).
>>
>> I saw some strange things that I couldn't understand:
>>
>> 1. running randread fio with libaio ioengine didn't call nvme_queue_rqs -
>> expected
>>
>> *2. running randwrite fio with libaio ioengine did call nvme_queue_rqs - Not
>> expected !!*
>>
>> *3. running randread fio with io_uring ioengine (and --iodepth_batch=32)
>> didn't call nvme_queue_rqs - Not expected !!*
>>
>> 4. running randwrite fio with io_uring ioengine (and --iodepth_batch=32) did
>> call nvme_queue_rqs - expected
>>
>> 5. *running randread fio with io_uring ioengine (and --iodepth_batch=32
>> --runtime=30) didn't finish after 30 seconds and stuck for 300 seconds (fio
>> jobs required "kill -9 fio" to remove refcounts from nvme_core)   - Not
>> expected !!*
>>
>> *debug pring: fio: job 'task_nvme0n1' (state=5) hasn't exited in 300
>> seconds, it appears to be stuck. Doing forceful exit of this job.
>> *
>>
>> *6. ***running randwrite fio with io_uring ioengine (and  --iodepth_batch=32
>> --runtime=30) didn't finish after 30 seconds and stuck for 300 seconds (fio
>> jobs required "kill -9 fio" to remove refcounts from nvme_core)   - Not
>> expected !!**
>>
>> ***debug pring: fio: job 'task_nvme0n1' (state=5) hasn't exited in 300
>> seconds, it appears to be stuck. Doing forceful exit of this job.***
>>
>>
>> any idea what could cause these unexpected scenarios ? at least unexpected
>> for me :)
> Not sure about all the scenarios. I believe it should call queue_rqs
> anytime we finish a plugged list of requests as long as the requests
> come from the same request_queue, and it's not being flushed from
> io_schedule().

I also see we have batch > 1 only in the start of the fio operation. 
After X IO operations batch size == 1 till the end of the fio.

>
> The stuck fio job might be a lost request, which is what this series
> should address. It would be unusual to see such an error happen in
> normal operation, though. I had to synthesize errors to verify the bug
> and fix.

But there is no timeout error and prints in the dmesg.

If there was a timeout prints I would expect the issue might be in the 
local NVMe device, but there isn't.

Also this phenomena doesn't happen with NVMf/RDMA code I developed locally.

>
> In any case, I'll run more multi-namespace tests to see if I can find
> any other issues with shared tags.

I believe that the above concerns are not related to the shared-tags but 
to the entire mechanism.




More information about the Linux-nvme mailing list