[PATCH v3 0/3] Handle number of queue changes
Sagi Grimberg
sagi at grimberg.me
Tue Aug 30 00:57:46 PDT 2022
>>> Updated this series to a proper patch series with Hannes and Sagi's
>>> feedback addressed.
>>>
>>> I tested this with nvme-tcp but due to lack of hardware the nvme-rdma
>>> is only compile tested.
>>>
>>
>> One does wonder: what about FC?
>> Does it suffer from the same problems?
>>
>> Cheers,
>>
>> Hannes
>
>
> Yep, wondering too. I don't think so... FC does do this differently.
>
> We don't realloc io queues nor tag_sets. We reuse the allocations and
> tag_set originally created in the 1st successful association for the
> controller.
>
> On reconnect, we set the queue count, then call
> blk_mq_update_nr_hw_queues() if it changed before we get into the loops
> to change queue states.
That is the same as tcp/rdma. We don't realloc tagsets or queues, we
just reinitialize the queues based on the queue_count.
> So FC's call to update nr_hw_queues() is much earlier than rdma/tcp today.
The only difference is that you don't freeze the request queues when
tearing down the controller, so you can allocate/start all the queues in
one go.
In pci/rdma/tcp, we start by freezing the request queues to address a
hang that happens with multiple queue-maps (default/read/poll).
See 2875b0aecabe ("nvme-tcp: fix controller reset hang during traffic").
I don't think that fc supports multiple queue maps, but in case the
number of queue changes, blk_mq_update_nr_hw_queues() will still attempt
to freeze the request queues, which may lead to a hang if some requests
may not be able to complete (because the queues are quiesced at this
time). However, I see that fc starts the queues in the end of
nvme_fc_delete_association (which is a bit strange because the same can
be achieved by passing start_queues=true to
__nvme_fc_abort_outstanding_ios.
But that is the main difference, tcp/rdma does not start the queues when
tearing down a controller in a reset, only after we re-establish the
queues. I think this was needed to support a non-mpath configurations,
where IOs do not failover. Maybe that is a legacy thing now for fabrics
though...
More information about the Linux-nvme
mailing list