[PATCH 07/24] scsi-multipath: clone each bio
John Garry
john.g.garry at oracle.com
Mon Mar 2 04:12:54 PST 2026
On 02/03/2026 03:21, Benjamin Marzinski wrote:
> On Wed, Feb 25, 2026 at 03:36:10PM +0000, John Garry wrote:
>> For failover handling, we must resubmit each bio.
>>
>> However, unlike NVMe, for SCSI there is no guarantee that any bio submitted
>> is either all or none completed.
>>
>> As such, for SCSI, for failover handling we will take the approach to
>> just re-submit the original bio. For this clone and submit each bio.
>>
>> Signed-off-by: John Garry <john.g.garry at oracle.com>
>> ---
>> drivers/scsi/scsi_multipath.c | 51 ++++++++++++++++++++++++++++++++++-
>> include/scsi/scsi_multipath.h | 1 +
>> 2 files changed, 51 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_multipath.c b/drivers/scsi/scsi_multipath.c
>> index 4b7984e7e74ba..d79a92ec0cf6c 100644
>> --- a/drivers/scsi/scsi_multipath.c
>> +++ b/drivers/scsi/scsi_multipath.c
>> @@ -89,6 +89,14 @@ module_param_call(iopolicy, scsi_set_iopolicy, scsi_get_iopolicy,
>> MODULE_PARM_DESC(iopolicy,
>> "Default multipath I/O policy; 'numa' (default), 'round-robin' or 'queue-depth'");
>>
>> +struct scsi_mpath_clone_bio {
>> + struct bio *master_bio;
>> + struct bio clone;
>> +};
>
> If the only extra information you need for your clone bios is a pointer
> to the original bio, I think you can just store that in bi_private. So
> you shouldn't actually need to allocate any front pad for your bioset.
Yes, seems a decent idea
>
>> +
>> +#define scsi_mpath_to_master_bio(clone) \
>> + container_of(clone, struct scsi_mpath_clone_bio, clone)
>> +
>> static int scsi_mpath_unique_lun_id(struct scsi_device *sdev)
>> {
>> struct scsi_mpath_device *scsi_mpath_dev = sdev->scsi_mpath_dev;
>
>> @@ -260,6 +269,39 @@ static int scsi_multipath_sdev_init(struct scsi_device *sdev)
>> return 0;
>> }
>>
>> +static void scsi_mpath_clone_end_io(struct bio *clone)
>> +{
>> + struct scsi_mpath_clone_bio *scsi_mpath_clone_bio =
>> + scsi_mpath_to_master_bio(clone);
>> + struct bio *master_bio = scsi_mpath_clone_bio->master_bio;
>> +
>> + master_bio->bi_status = clone->bi_status;
>> + bio_put(clone);
>> + bio_endio(master_bio);
>> +}
>> +
>> +static struct bio *scsi_mpath_clone_bio(struct bio *bio)
>> +{
>> + struct mpath_disk *mpath_disk = bio->bi_bdev->bd_disk->private_data;
>> + struct mpath_head *mpath_head = mpath_disk->mpath_head;
>> + struct scsi_mpath_clone_bio *scsi_mpath_clone_bio;
>> + struct scsi_mpath_head *scsi_mpath_head = mpath_head->drvdata;
>> + struct bio *clone;
>> +
>> + clone = bio_alloc_clone(bio->bi_bdev, bio, GFP_NOWAIT,
>> + &scsi_mpath_head->bio_pool);
>
> Why use GFP_NOWAIT? It's more likely to fail than GFP_NOIO. If the bio
> has REQ_NOWAIT set, I can see where you would need this, but otherwise,
> I don't see why GFP_NOIO wouldn't be better here.
Seems reasonable to try GFP_NOIO. Furthermore, we really can't tolerate
the clone to fail. So, if it does, we should return an error pointer
here and mpath_bdev_submit_bio() should error the original bio.
>
>> + if (!clone)
>> + return NULL;
>> +
>> + clone->bi_end_io = scsi_mpath_clone_end_io;
>> +
>> + scsi_mpath_clone_bio = container_of(clone,
>> + struct scsi_mpath_clone_bio, clone);
>> + scsi_mpath_clone_bio->master_bio = bio;
>> +
>> + return clone;
>> +}
>> +
>> static enum mpath_iopolicy_e scsi_mpath_get_iopolicy(struct mpath_head *mpath_head)
>> {
>> struct scsi_mpath_head *scsi_mpath_head = mpath_head->drvdata;
>> @@ -269,6 +311,7 @@ static enum mpath_iopolicy_e scsi_mpath_get_iopolicy(struct mpath_head *mpath_he
>>
>> struct mpath_head_template smpdt_pr = {
>> .get_iopolicy = scsi_mpath_get_iopolicy,
>> + .clone_bio = scsi_mpath_clone_bio,
>> };
>>
>> static struct scsi_mpath_head *scsi_mpath_alloc_head(void)
>> @@ -283,9 +326,13 @@ static struct scsi_mpath_head *scsi_mpath_alloc_head(void)
>> ida_init(&scsi_mpath_head->ida);
>> mutex_init(&scsi_mpath_head->lock);
>>
>> + if (bioset_init(&scsi_mpath_head->bio_pool, SCSI_MAX_QUEUE_DEPTH,
>> + offsetof(struct scsi_mpath_clone_bio, clone),
>> + BIOSET_NEED_BVECS|BIOSET_PERCPU_CACHE))
>
> You don't need 4096 cached bios to guarantee forward progress. I don't
> see why BIO_POOL_SIZE won't work fine here.
Every bio which we are sent is cloned. And SCSI_MAX_QUEUE_DEPTH is used
as the cached bio size - wouldn't it make sense to cache more than 2 bios?
> Also, since you are cloning
> bios, they are sharing the original bio's iovecs, so you don't need
> BIOSET_NEED_BVECS.
>
ok
thanks!
More information about the Linux-nvme
mailing list