[PATCH v2 7/7] nvme: add reserved ioq tags for cancel

John Meneghini jmeneghi at redhat.com
Wed Jun 26 12:10:21 PDT 2024


This is the patch I wrote to solve this problem while at LSF/MM. Is this what you are thinking about Sagi?

Note: more changes are needed to the error handlers to account for this.  The idea is that the eh will need to be modified to 
keep track of outstanding nvme-cancel command for each io queue.  Following the first command timeout the eh will send one 
single command cancel command to abort the slow command in the controller.  If second command timeout occurs before the CQE for 
the first cancel is returned by the controller the error handler sends a  Multiple Command Cancel to the IO queue with NSID set 
to FFFFFFFFh.  This form of the cancel command will cancel/abort all outstanding commands on the IO queue.

The problem is, in most cases when a command times out due to a problem in the controller not just one command times out but all 
outstanding commands timeout in a thundering herd. There are cases where a single IO will hang up, but those aren't usually 
reads or writes - like a reservation command that gets suck, or a dsm command that's going slow. Usually when reads and writes 
start timing out it's because the storage is just swamped and all IOs start to slow down.

Therefore, with only 2 reserved tags on each IO queue, the host should be able to use the cancel command to abort any and all 
outstanding IOs that time out.

/John

On 6/26/24 14:38, John Meneghini wrote:Multiple Command Cancel:
> If the nvme Cancel command is supported, we need to reserve 2 tags for
> each IO queue. Note that one addition tag is reserved to account for
> the case where this is a fabrics controller.
> 
> Signed-off-by: John Meneghini <jmeneghi at redhat.com>
> ---
>   drivers/nvme/host/core.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 691dd6ee6dc3..76554fb373a3 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4570,6 +4570,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
>   		unsigned int cmd_size)
>   {
>   	int ret;
> +	u32 effects = le32_to_cpu(ctrl->effects->iocs[nvme_cmd_cancel]);
>   
>   	memset(set, 0, sizeof(*set));
>   	set->ops = ops;
> @@ -4580,9 +4581,13 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
>   	 */
>   	if (ctrl->quirks & NVME_QUIRK_SHARED_TAGS)
>   		set->reserved_tags = NVME_AQ_DEPTH;
> +	else if  (effects & NVME_CMD_EFFECTS_CSUPP)
> +		/* Reserve 2 X io_queue count for NVMe Cancel */
> +		set->reserved_tags = (2 * ctrl->queue_count);
>   	else if (ctrl->ops->flags & NVME_F_FABRICS)
>   		/* Reserved for fabric connect */
>   		set->reserved_tags = 1;
> +
>   	set->numa_node = ctrl->numa_node;
>   	set->flags = BLK_MQ_F_SHOULD_MERGE;
>   	if (ctrl->ops->flags & NVME_F_BLOCKING)




More information about the Linux-nvme mailing list