[PATCH 3/4] nvmet: prevent max_qid changes for discovered subsystems
Max Gurtovoy
mgurtovoy at nvidia.com
Thu Sep 25 01:28:04 PDT 2025
On 25/09/2025 10:36, Daniel Wagner wrote:
> On Wed, Sep 24, 2025 at 11:26:03PM +0300, Max Gurtovoy wrote:
>> Disallow updates to the max_qid attribute via configfs after a
>> subsystem has been discovered. This prevents invalid configuration
>> and avoids races during controller setup. The maximal queue
>> identifier can now only be set on non-discovered subsystems,
>> ensuring consistent configuration state.
> IIUC, this change will break an existing blktests where it tests if
> the max queue changes on reconnect. This took a while to fix in the host
> and the very reason the disconnect call is buried in the max queue
> update. blktests nvme/048
>
> The scenario where this happens if you have a multi node target and one
> node after the other gets a new firmware. For some reason the number of
> max queue changes in the new firmware. When the host fails over to an
> updated node the host can't blindly reuse the old queue max:
>
> 555f66d0f8a3 ("nvme-fc: update hardware queues before using them")
>
> So if we find a way to keep this tests scenario alive I don't mind. But
> as I said I think this will break it.
The above mentioned commit 555f66d0f8a3 ("nvme-fc: update hardware
queues before using them") was added to Linux-v5.15.
The qid_max support was added to Linux-v6.1 commit 3e980f5995e0 ("nvmet:
expose max queues to configfs").
IMO the real objective of adding qid_max support is to make the
subsystem more configurable and not for testing reconnect attempts of
the host, as was mentioned in the commit message.
This re-connect scenario can be easily tested in a different manner, and
the target driver code is not there for testing the host driver unique
scenarios. At least it is not its main goal.
If we allow changing the qid_max attribute at any time during the
lifecycle of the subsystem it may lead to a bad configuration and
unexpected behavior.
For example the following race:
1. user set max_qid attr to 10
2. during "nvmet_alloc_ctrl" - ctrl->sqs = kcalloc(subsys->max_qid + 1,
sizeof(struct nvmet_sq *), GFP_KERNEL);//max_qid = 10
3. user set max_qid to attr 20
4. during "nvmet_alloc_ctrl" - ctrl->cqs = kcalloc(subsys->max_qid + 1,
sizeof(struct nvmet_cq *), GFP_KERNEL);//max_qid = 20
The above happens before we call "list_add_tail(&ctrl->subsys_entry,
&subsys->ctrls);" and it will lead to sqs[11] and cqs[21] and max_qid ==
20 without any reconnect attempts.
I don't know how the driver will handle it but for sure this is a
situation we would like to avoid.
I guess we can try fixing this problem in other ways, but why should we
complicate the code so much ? let's try to avoid this scenario to happen
and simplify the code where we can...
More information about the Linux-nvme
mailing list