[PATCH 3/4] nvmet: prevent max_qid changes for discovered subsystems

Max Gurtovoy mgurtovoy at nvidia.com
Thu Sep 25 01:28:04 PDT 2025


On 25/09/2025 10:36, Daniel Wagner wrote:
> On Wed, Sep 24, 2025 at 11:26:03PM +0300, Max Gurtovoy wrote:
>> Disallow updates to the max_qid attribute via configfs after a
>> subsystem has been discovered. This prevents invalid configuration
>> and avoids races during controller setup. The maximal queue
>> identifier can now only be set on non-discovered subsystems,
>> ensuring consistent configuration state.
> IIUC, this change will break an existing blktests where it tests if
> the max queue changes on reconnect. This took a while to fix in the host
> and the very reason the disconnect call is buried in the max queue
> update. blktests nvme/048
>
> The scenario where this happens if you have a multi node target and one
> node after the other gets a new firmware. For some reason the number of
> max queue changes in the new firmware. When the host fails over to an
> updated node the host can't blindly reuse the old queue max:
>
> 555f66d0f8a3 ("nvme-fc: update hardware queues before using them")
>
> So if we find a way to keep this tests scenario alive I don't mind. But
> as I said I think this will break it.

The above mentioned commit 555f66d0f8a3 ("nvme-fc: update hardware 
queues before using them") was added to Linux-v5.15.

The qid_max support was added to Linux-v6.1 commit 3e980f5995e0 ("nvmet: 
expose max queues to configfs").

IMO the real objective of adding qid_max support is to make the 
subsystem more configurable and not for testing reconnect attempts of 
the host, as was mentioned in the commit message.

This re-connect scenario can be easily tested in a different manner, and 
the target driver code is not there for testing the host driver unique 
scenarios. At least it is not its main goal.

If we allow changing the qid_max attribute at any time during the 
lifecycle of the subsystem it may lead to a bad configuration and 
unexpected behavior.

For example the following race:

1. user set max_qid attr to 10

2. during "nvmet_alloc_ctrl" - ctrl->sqs = kcalloc(subsys->max_qid + 1, 
sizeof(struct nvmet_sq *), GFP_KERNEL);//max_qid = 10

3. user set max_qid to attr 20

4. during "nvmet_alloc_ctrl" - ctrl->cqs = kcalloc(subsys->max_qid + 1, 
sizeof(struct nvmet_cq *), GFP_KERNEL);//max_qid = 20

The above happens before we call "list_add_tail(&ctrl->subsys_entry, 
&subsys->ctrls);" and it will lead to sqs[11] and cqs[21] and max_qid == 
20 without any reconnect attempts.

I don't know how the driver will handle it but for sure this is a 
situation we would like to avoid.

I guess we can try fixing this problem in other ways, but why should we 
complicate the code so much ? let's try to avoid this scenario to happen 
and simplify the code where we can...







More information about the Linux-nvme mailing list