[PATCH] nvme-mpath: fix I/O failure with EAGAIN when failing over I/O

Tue Jun 20 02:59:21 PDT 2023

> Hello Sagi,
> 
> On Mon, 2023-06-19 at 17:10 +0300, Sagi Grimberg wrote:
>> It is possible that the next available path we failover to, happens
>> to
>> be frozen (for example if it is during connection establishment). If
>> the original I/O was set with NOWAIT, this cause the I/O to
>> unnecessarily
>> fail because the request queue cannot be entered, hence the I/O fails
>> with
>> EAGAIN.
>>
>> The NOWAIT restriction that was originally set for the I/O is no
>> longer
>> relevant or needed because this is the nvme requeue context. Hence we
>> clear the REQ_NOWAIT flag when failing over I/O.
> 
> Could you please explain this in more detail? We are on the bio level,
> thus IIUC a new request will need to be allocated when the bio is
> requeued.

The issue is not the tag allocation, its entering the request queue,
which fails immediately if the bio has NOWAIT set on it (and the
queue is frozen).

> This means that if the fail-over queue is frozen e.g. during
> a NVMe controller reset, IO may be blocked for a possibly very long
> time,

That should not be the case, especially with Ming's patch that moves
the freeze/unfreeze after we successfully connect. This should address
any I/O that is held hostage for a long period of a frozen queue.

> which is what the NOWAIT flag was initially supposed to avoid.

NOWAIT was set by the issuer specifically because it's context must not
block on I/O. The failover is a different context, and there is no need
to require this, its no longer the issuer context.

> I am asking because we've seen a similar phenomenon with a 3rd party
> multipath implementation recently.

I have no idea what is this 3'rd party multipath implementation, nor how
it interacts with nvme multipathing.