[PATCH v3 0/3] Handle number of queue changes

Sagi Grimberg sagi at grimberg.me
Tue Aug 30 00:57:46 PDT 2022


>>> Updated this series to a proper patch series with Hannes and Sagi's
>>> feedback addressed.
>>>
>>> I tested this with nvme-tcp but due to lack of hardware the nvme-rdma
>>> is only compile tested.
>>>
>>
>> One does wonder: what about FC?
>> Does it suffer from the same problems?
>>
>> Cheers,
>>
>> Hannes
> 
> 
> Yep, wondering too. I don't think so... FC does do this differently.
> 
> We don't realloc io queues nor tag_sets.  We reuse the allocations and 
> tag_set originally created in the 1st successful association for the 
> controller.
> 
> On reconnect, we set the queue count, then call 
> blk_mq_update_nr_hw_queues() if it changed before we get into the loops 
> to change queue states.

That is the same as tcp/rdma. We don't realloc tagsets or queues, we
just reinitialize the queues based on the queue_count.

> So FC's call to update nr_hw_queues() is much earlier than rdma/tcp today.

The only difference is that you don't freeze the request queues when
tearing down the controller, so you can allocate/start all the queues in
one go.

In pci/rdma/tcp, we start by freezing the request queues to address a 
hang that happens with multiple queue-maps (default/read/poll).
See 2875b0aecabe ("nvme-tcp: fix controller reset hang during traffic").

I don't think that fc supports multiple queue maps, but in case the
number of queue changes, blk_mq_update_nr_hw_queues() will still attempt
to freeze the request queues, which may lead to a hang if some requests
may not be able to complete (because the queues are quiesced at this
time). However, I see that fc starts the queues in the end of
nvme_fc_delete_association (which is a bit strange because the same can
be achieved by passing start_queues=true to
__nvme_fc_abort_outstanding_ios.

But that is the main difference, tcp/rdma does not start the queues when
tearing down a controller in a reset, only after we re-establish the 
queues. I think this was needed to support a non-mpath configurations,
where IOs do not failover. Maybe that is a legacy thing now for fabrics 
though...



More information about the Linux-nvme mailing list