[PATCH 2/6] block: add support for selecting all zones

Damien Le Moal Damien.LeMoal at wdc.com
Fri Jun 26 03:06:34 EDT 2020


On 2020/06/26 15:52, Javier González wrote:
> On 26.06.2020 06:35, Damien Le Moal wrote:
>> On 2020/06/26 14:58, Javier González wrote:
>>> On 26.06.2020 01:27, Damien Le Moal wrote:
>>>> On 2020/06/25 21:22, Javier González wrote:
>>>>> From: Javier González <javier.gonz at samsung.com>
>>>>>
>>>>> Add flag to allow selecting all zones on a single zone management
>>>>> operation
>>>>>
>>>>> Signed-off-by: Javier González <javier.gonz at samsung.com>
>>>>> Signed-off-by: SelvaKumar S <selvakuma.s1 at samsung.com>
>>>>> Signed-off-by: Kanchan Joshi <joshi.k at samsung.com>
>>>>> Signed-off-by: Nitesh Shetty <nj.shetty at samsung.com>
>>>>> ---
>>>>>  block/blk-zoned.c             | 3 +++
>>>>>  include/linux/blk_types.h     | 3 ++-
>>>>>  include/uapi/linux/blkzoned.h | 9 +++++++++
>>>>>  3 files changed, 14 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>>>>> index e87c60004dc5..29194388a1bb 100644
>>>>> --- a/block/blk-zoned.c
>>>>> +++ b/block/blk-zoned.c
>>>>> @@ -420,6 +420,9 @@ int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode,
>>>>>  		return -ENOTTY;
>>>>>  	}
>>>>>
>>>>> +	if (zmgmt.flags & BLK_ZONE_SELECT_ALL)
>>>>> +		op |= REQ_ZONE_ALL;
>>>>> +
>>>>>  	return blkdev_zone_mgmt(bdev, op, zmgmt.sector, zmgmt.nr_sectors,
>>>>>  				GFP_KERNEL);
>>>>>  }
>>>>> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
>>>>> index ccb895f911b1..16b57fb2b99c 100644
>>>>> --- a/include/linux/blk_types.h
>>>>> +++ b/include/linux/blk_types.h
>>>>> @@ -351,6 +351,7 @@ enum req_flag_bits {
>>>>>  	 * work item to avoid such priority inversions.
>>>>>  	 */
>>>>>  	__REQ_CGROUP_PUNT,
>>>>> +	__REQ_ZONE_ALL,		/* apply zone operation to all zones */
>>>>>
>>>>>  	/* command specific flags for REQ_OP_WRITE_ZEROES: */
>>>>>  	__REQ_NOUNMAP,		/* do not free blocks when zeroing */
>>>>> @@ -378,7 +379,7 @@ enum req_flag_bits {
>>>>>  #define REQ_BACKGROUND		(1ULL << __REQ_BACKGROUND)
>>>>>  #define REQ_NOWAIT		(1ULL << __REQ_NOWAIT)
>>>>>  #define REQ_CGROUP_PUNT		(1ULL << __REQ_CGROUP_PUNT)
>>>>> -
>>>>> +#define REQ_ZONE_ALL		(1ULL << __REQ_ZONE_ALL)
>>>>>  #define REQ_NOUNMAP		(1ULL << __REQ_NOUNMAP)
>>>>>  #define REQ_HIPRI		(1ULL << __REQ_HIPRI)
>>>>>
>>>>> diff --git a/include/uapi/linux/blkzoned.h b/include/uapi/linux/blkzoned.h
>>>>> index 07b5fde21d9f..a8c89fe58f97 100644
>>>>> --- a/include/uapi/linux/blkzoned.h
>>>>> +++ b/include/uapi/linux/blkzoned.h
>>>>> @@ -157,6 +157,15 @@ enum blk_zone_action {
>>>>>  	BLK_ZONE_MGMT_RESET	= 0x4,
>>>>>  };
>>>>>
>>>>> +/**
>>>>> + * enum blk_zone_mgmt_flags - Flags for blk_zone_mgmt
>>>>> + *
>>>>> + * BLK_ZONE_SELECT_ALL: Select all zones for current zone action
>>>>> + */
>>>>> +enum blk_zone_mgmt_flags {
>>>>> +	BLK_ZONE_SELECT_ALL	= 1 << 0,
>>>>> +};
>>>>> +
>>>>>  /**
>>>>>   * struct blk_zone_mgmt - Extended zoned management
>>>>>   *
>>>>>
>>>>
>>>> NACK.
>>>>
>>>> Details:
>>>> 1) REQ_OP_ZONE_RESET together with REQ_ZONE_ALL is the same as
>>>> REQ_OP_ZONE_RESET_ALL, isn't it ? You are duplicating a functionality that
>>>> already exists.
>>>> 2) The patch introduces REQ_ZONE_ALL at the block layer only without defining
>>>> how it ties into SCSI and NVMe driver use of it. Is REQ_ZONE_ALL indicating that
>>>> the zone management commands are to be executed with the ALL bit set ? If yes,
>>>> that will break device-mapper. See the special code for handling
>>>> REQ_OP_ZONE_RESET_ALL. That code is in place for a reason: the target block
>>>> device may not be an entire physical device. In that case, applying a zone
>>>> management command to all zones of the physical drive is wrong.
>>>> 3) REQ_ZONE_ALL seems completely equivalent to specifying a sector range of [0
>>>> .. drive capacity]. So what is the point ? The current interface handles that.
>>>> That is how we chose between REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL right now.
>>>> 4) Without any in-kernel user, I do not see the point. And for applications, I
>>>> do not see any good use case for doing open all, close all, offline all or
>>>> finish all. If you have any such good use case, please elaborate.
>>>>
>>>
>>> The main use if reset all, but without having to look through all zones,
>>> as it imposes an overhead when we have a large number of zones. Having
>>> the possibility to offload it to HW is more efficient.
>>>
>>> I had not thought about the device mapper use case. Would it be an
>>> option to translate this into REQ_OP_ZONE_RESET_ALL when we have a
>>> device mapper (or any other case where this might break) and then leave
>>> the bit go to the driver if it applies to the whole device?
>>
>> But REQ_OP_ZONE_RESET_ALL is already implemented and supported and will reset
>> all zones of a drive using a single command if the ioctl is called for the
>> entire sector range of a physical drive. For device mapper with a partial
>> mapping, the per zone reset loop will be used. If you have no other use case for
>> the REQ_ZONE_ALL flag, what is the point here ? Reset is already optimized for
>> the all zones case
> 
> OK. I might have missed this. I thought we were sending several commands
> instead of a single reset with the bit. I will check again. Thanks for
> pointing at this.

In block/blk-zoned.c, there is:

static inline bool blkdev_allow_reset_all_zones(struct block_device *bdev,
                                                sector_t sector,
                                                sector_t nr_sectors)
{
        if (!blk_queue_zone_resetall(bdev_get_queue(bdev)))
                return false;

        /*
         * REQ_OP_ZONE_RESET_ALL can be executed only if the number of sectors
         * of the applicable zone range is the entire disk.
         */
        return !sector && nr_sectors == get_capacity(bdev->bd_disk);
}

And:

int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op,
                     sector_t sector, sector_t nr_sectors,
                     gfp_t gfp_mask)
{
	...

	while (sector < end_sector) {
                bio = blk_next_bio(bio, 0, gfp_mask);
                bio_set_dev(bio, bdev);

                /*
                 * Special case for the zone reset operation that reset all
                 * zones, this is useful for applications like mkfs.
                 */
                if (op == REQ_OP_ZONE_RESET &&
                    blkdev_allow_reset_all_zones(bdev, sector, nr_sectors)) {
                        bio->bi_opf = REQ_OP_ZONE_RESET_ALL;
                        break;
			^^^^^ This means one command only...

                }

                bio->bi_opf = op | REQ_SYNC;
                bio->bi_iter.bi_sector = sector;
                sector += zone_sectors;

                /* This may take a while, so be nice to others */
                cond_resched();
        }

        ret = submit_bio_wait(bio);
        bio_put(bio);

        return ret;
}

And see scsi/sd_zbc.c and zns.c. REQ_OP_ZONE_RESET_ALL end up setting the ALL
bit for reset command.


> 
> Javier
> 


-- 
Damien Le Moal
Western Digital Research



More information about the Linux-nvme mailing list