[PATCH blktests v3] nvme/046: test queue count changes on reconnect
Sagi Grimberg
sagi at grimberg.me
Wed Sep 14 03:37:38 PDT 2022
>>>> FYI, each blktests test case can define DMESG_FILTER not to fail with specific
>>>> keywords in dmesg. Test cases meta/011 and block/028 are reference use
>>>> cases.
>>>
>>> Ah okay, let me look into it.
>>
>> So I made the state read function a bit more robust (test if state file
>> exists) and the it turns out this made rdma happy(??) but tcp is still
>> breaking.
>
> s/tcp/fc/
>
> On closer inspection I see following sequence for fc:
>
> [399664.863585] nvmet: connect request for invalid subsystem blktests-subsystem-1!
> [399664.863704] nvme nvme0: Connect Invalid Data Parameter, subsysnqn "blktests-subsystem-1"
> [399664.863758] nvme nvme0: NVME-FC{0}: reset: Reconnect attempt failed (16770)
> [399664.863784] nvme nvme0: NVME-FC{0}: reconnect failure
> [399664.863837] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
>
> When the host tries to reconnect to a non existing controller (the test
> called _remove_nvmet_subsystem_from_port()) the target returns 0x4182
> (NVME_SC_DNR|NVME_SC_READ_ONLY(?)).
That is not something that the target is supposed to be doing, I have no
idea why this is sent. Perhaps this is something specific to the fc
implementation?
So arguably fc behaves correct by
> stopping the reconnects. tcp and rdma just ignore the DNR.
DNR means do not retry the command, it says nothing about do not attempt
a future reconnect...
> If we agree that the fc behavior is the right one, then the nvmet code
> needs to be changed so that when the qid_max attribute changes it forces
> a reconnect. The trick with calling _remove_nvmet_subsystem_from_port()
> to force a reconnect is not working. And tcp/rdma needs to honor the
> DNR.
tcp/rdma honor DNR afaik.
More information about the Linux-nvme
mailing list