[PATCH 4/4] nvme: set 'failfast_expired' in nvme_remove_namespaces()

Hannes Reinecke hare at suse.de
Fri Sep 6 01:32:42 PDT 2024


On 9/6/24 09:38, Sagi Grimberg wrote:
> Please describe what the patch fixes in the title, not what it does.
> 
> 
Yeah, will do.

> On 06/09/2024 10:18, Hannes Reinecke wrote:
>> nvme_remove_namespaces() is only called when the controller is
>> being removed. If there is a scan process still pending and
>> the I/O from that process cannot make progress (eg if all paths
>> are in ANA state 'inaccessible') we cannot disconnect the
>> controller as the 'nvme disconnect' process will hang in
>> flush_work(&ctrl->scan_work).
> 
> Please include the hang process stack.
> 
Ok.

>>
>> This patch sets the 'failfast_expired' bit for the controller
>> to cause all pending I/O to be failed, and the disconnect process
>> to complete.
> 
> How did you reproduce it? trigger namespace scanning and disconnect-all
> in a loop? Can we get a blktest for it?
> 
The script is quite easy:

for i in $(seq 0 8); do
         for j in $(seq 1 6 | shuf); do
                 ns="${CONFIGFS}/subsystems/${NQN}/namespaces/${j}"
                 grpid=$(( ((($j / 3) + $i) % 3) + 1 ))
                 echo "$i: enable $j (grpid $grpid)"
                 echo $grpid > $ns/ana_grpid
                 uuidgen > $ns/device_uuid
                 echo 1 > $ns/enable
         done
         ls /sys/block | grep -v ram
         for j in $(seq 1 6 | shuf); do
                 echo "$i: disable $j"
                 ns="${CONFIGFS}/subsystems/${NQN}/namespaces/${j}"
                 echo 0 > $ns/enable
         done
         ls /sys/block | grep -v ram
done

but in order to recreate that you'd need several paths, each with its
own ANA group id, and at least one in 'inaccessible'.
On my testing I've been using 6 paths, four 'inaccessible', one 
'optimized', and one 'non-optimized'.
And having one namespace for each ANA group ID.

I'm planning to do a blktest for it, but that requires to first 
implement ANA support there.

>>
>> Signed-off-by: Hannes Reinecke <hare at kernel.org>
>> ---
>>   drivers/nvme/host/core.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index 651073280f6f..b968b672dcf8 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -4222,6 +4222,13 @@ void nvme_remove_namespaces(struct nvme_ctrl 
>> *ctrl)
>>        */
>>       nvme_mpath_clear_ctrl_paths(ctrl);
>> +    /*
>> +     * Mark the controller as 'failfast' to ensure all pending I/O
>> +     * is killed.
>> +     */
>> +    set_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags);
> 
> What about nvme_stop_failfast_work() ?
> 
That is already done; nvme_remove_namespaces() is called after 
nvme_stop_ctrl() which already called nvme_stop_failfast_work().

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich




More information about the Linux-nvme mailing list