[PATCH] nvme: do not ignore nvme status in nvme_set_queue_count()

Fri Jan 22 11:35:35 EST 2021

On 1/21/21 9:14 PM, Keith Busch wrote:
> On Thu, Jan 21, 2021 at 10:50:21AM +0100, Hannes Reinecke wrote:
>> If the call to nvme_set_queue_count() fails with a status we should
>> not ignore it but rather pass it on to the caller.
>> It's then up to the transport to decide whether to ignore it
>> (like PCI does) or to reset the connection (as would be appropriate
>> for fabrics).
> 
> Instead of checking the error, wouldn't checking the number of created
> queues be sufficient? What handling difference do you expect to occur
> between getting a success with 0 queues, vs getting an error?
> 
The difference is that an error will (re-)start recovery, 0 queues won't.
But the problem here is that nvme_set_queue_count() is being called 
during reconnection, ie during the recovery process itself.
And this command is returned with a timeout, which in any other case is 
being treated as a fatal error. Plus we have been sending this command 
on the admin queue, so a timeout on the admin queue pretty much _is_  a 
fatal error. So we should be terminating the current recovery and 
reconnect. None of that will happen if we return '0' queues.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare at suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer