[PATCH] nvme-multipath: don't inherit LBA-related fields for the multipath node

Nilay Shroff nilay at linux.ibm.com
Thu Mar 21 22:15:04 PDT 2024



On 3/22/24 02:38, Christoph Hellwig wrote:
> Linux 6.9 made the nvme multipath nodes not properly pick up changes when
> the LBA size goes smaller after an nvme format.  This is because we now
> try to inherit the queue settings for the multipath node entirely from
> the individual paths.  That is the right thing to do for I/O size
> limitations, which make up most of the queue limits, but it is wrong for
> changes to the namespace configuration, where we do want to pick up the
> new format, which will eventually show up on all paths once they are
> re-queried.
> 
> Fix this by not inheriting the block size and related fields and always
> for updating them.
> 
> Fixes: 8f03cfa117e0 ("nvme: don't use nvme_update_disk_info for the multipath disk")
> Reported-by: Nilay Shroff <nilay at linux.ibm.com>
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/nvme/host/core.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 00864a63447099..4bac54d4e0015b 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2204,6 +2204,7 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
>  	}
>  
>  	if (!ret && nvme_ns_head_multipath(ns->head)) {
> +		struct queue_limits *ns_lim = &ns->disk->queue->limits;
>  		struct queue_limits lim;
>  
>  		blk_mq_freeze_queue(ns->head->disk->queue);
> @@ -2215,7 +2216,26 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
>  		set_disk_ro(ns->head->disk, nvme_ns_is_readonly(ns, info));
>  		nvme_mpath_revalidate_paths(ns);
>  
> +		/*
> +		 * queue_limits mixes values that are the hardware limitations
> +		 * for bio splitting with what is the device configuration.
> +		 *
> +		 * For NVMe the device configuration can change after e.g. a
> +		 * Format command, and we really want to pick up the new format
> +		 * value here.  But we must still stack the queue limits to the
> +		 * least common denominator for multipathing to split the bios
> +		 * properly.
> +		 *
> +		 * To work around this, we explicitly set the device
> +		 * configuration to those that we just queried, but only stack
> +		 * the splitting limits in to make sure we still obey possibly
> +		 * lower limitations of other controllers.
> +		 */
>  		lim = queue_limits_start_update(ns->head->disk->queue);
> +		lim.logical_block_size = ns_lim->logical_block_size;
> +		lim.physical_block_size = ns_lim->physical_block_size;
> +		lim.io_min = ns_lim->io_min;
> +		lim.io_opt = ns_lim->io_opt;
>  		queue_limits_stack_bdev(&lim, ns->disk->part0, 0,
>  					ns->head->disk->disk_name);
>  		ret = queue_limits_commit_update(ns->head->disk->queue, &lim);

I had tested the above patch from Christoph and it looks good.

Test results could be found here: 
https://lore.kernel.org/all/239228ec-6c8d-432c-905d-b477014deee3@linux.ibm.com/

Thanks,
--Nilay




More information about the Linux-nvme mailing list