[PATCH 6/6] nvme: Add consistency check for zone count

Javier González javier at javigon.com
Fri Jun 26 02:55:46 EDT 2020


On 26.06.2020 06:49, Damien Le Moal wrote:
>On 2020/06/26 15:13, Javier González wrote:
>> On 26.06.2020 00:04, Damien Le Moal wrote:
>>> On 2020/06/26 6:49, Keith Busch wrote:
>>>> On Thu, Jun 25, 2020 at 02:21:52PM +0200, Javier González wrote:
>>>>>  drivers/nvme/host/zns.c | 7 +++++++
>>>>>  1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
>>>>> index 7d8381fe7665..de806788a184 100644
>>>>> --- a/drivers/nvme/host/zns.c
>>>>> +++ b/drivers/nvme/host/zns.c
>>>>> @@ -234,6 +234,13 @@ static int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
>>>>>  		sector += ns->zsze * nz;
>>>>>  	}
>>>>>
>>>>> +	if (nr_zones < 0 && zone_idx != ns->nr_zones) {
>>>>> +		dev_err(ns->ctrl->device, "inconsistent zone count %u/%u\n",
>>>>> +				zone_idx, ns->nr_zones);
>>>>> +		ret = -EINVAL;
>>>>> +		goto out_free;
>>>>> +	}
>>>>> +
>>>>>  	ret = zone_idx;
>>>>
>>>> nr_zones is unsigned, so it's never < 0.
>>>>
>>>> The API we're providing doesn't require zone_idx equal the namespace's
>>>> nr_zones at the end, though. A subset of the total number of zones can
>>>> be requested here.
>>>>
>>
>> I did see nr_zones coming with -1; guess it is my compiler.
>
>See include/linux/blkdev.h. -1 is:
>
>#define BLK_ALL_ZONES  ((unsigned int)-1)
>
>Which is documented in block/blk-zoned.c:
>
>/**
> * blkdev_report_zones - Get zones information
> * @bdev:       Target block device
> * @sector:     Sector from which to report zones
> * @nr_zones:   Maximum number of zones to report
> * @cb:         Callback function called for each reported zone
> * @data:       Private data for the callback
> *
> * Description:
> *    Get zone information starting from the zone containing @sector for at most
> *    @nr_zones, and call @cb for each zone reported by the device.
> *    To report all zones in a device starting from @sector, the BLK_ALL_ZONES
> *    constant can be passed to @nr_zones.
> *    Returns the number of zones reported by the device, or a negative errno
> *    value in case of failure.
> *
> *    Note: The caller must use memalloc_noXX_save/restore() calls to control
> *    memory allocations done within this function.
> */
>int blkdev_report_zones(struct block_device *bdev, sector_t sector,
>                        unsigned int nr_zones, report_zones_cb cb, void *data)
>
>>
>>>
>>> Yes, absolutely. zone_idx is not an absolute zone number. It is the index of the
>>> reported zone descriptor in the current report range requested by the user,
>>> which is not necessarily for the entire drive (i.e., provided nr zones is less
>>> than the total number of zones of the disk and/or start sector is > 0). So
>>> zone_idx indicates the actual number of zones reported, it is not the total
>>
>> I see. As I can see, when nr_zones comes undefined I believed we could
>> assume that zone_idx is absolute, but I can be wrong.
>
>No. zone_idx is *always* the index of the zone in the current report. Whatever
>that report is, regardless of the report starting point and number of zones
>requested. E.g. For a single zone report (nr_zones = 1), you will always see
>zone_idx = 0. For a full report, zone_idx will correspond to the zone number.
>This is used for example in blk_revalidate_disk_zones() to initialize the zone
>bitmaps.
>
>> Does it make sense to support this check with an additional counter and
>> a explicit nr_zones initialization when undefined or you
>> prefer to just remove it as Matias suggested?
>
>The check is not needed at all.
>
>If the device is buggy and reports more zones than the device capacity or any
>other bugs, the driver can catch that when it processes the report.
>blk_revalidate_disk_zones() also has many checks.

I have managed to create a QEMU ZNS device that gave me a headache with
a little bit of extra capacity that triggered an additional zone report.
This was the motivation for the patch.

I will look at the checks to cover this case too then.

Thanks,
Javier



More information about the Linux-nvme mailing list