[PATCH 6/6] nvme: Add consistency check for zone count
Javier González
javier at javigon.com
Fri Jun 26 03:29:00 EDT 2020
On 26.06.2020 07:09, Damien Le Moal wrote:
>On 2020/06/26 15:55, Javier González wrote:
>> On 26.06.2020 06:49, Damien Le Moal wrote:
>>> On 2020/06/26 15:13, Javier González wrote:
>>>> On 26.06.2020 00:04, Damien Le Moal wrote:
>>>>> On 2020/06/26 6:49, Keith Busch wrote:
>>>>>> On Thu, Jun 25, 2020 at 02:21:52PM +0200, Javier González wrote:
>>>>>>> drivers/nvme/host/zns.c | 7 +++++++
>>>>>>> 1 file changed, 7 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
>>>>>>> index 7d8381fe7665..de806788a184 100644
>>>>>>> --- a/drivers/nvme/host/zns.c
>>>>>>> +++ b/drivers/nvme/host/zns.c
>>>>>>> @@ -234,6 +234,13 @@ static int nvme_ns_report_zones(struct nvme_ns *ns, sector_t sector,
>>>>>>> sector += ns->zsze * nz;
>>>>>>> }
>>>>>>>
>>>>>>> + if (nr_zones < 0 && zone_idx != ns->nr_zones) {
>>>>>>> + dev_err(ns->ctrl->device, "inconsistent zone count %u/%u\n",
>>>>>>> + zone_idx, ns->nr_zones);
>>>>>>> + ret = -EINVAL;
>>>>>>> + goto out_free;
>>>>>>> + }
>>>>>>> +
>>>>>>> ret = zone_idx;
>>>>>>
>>>>>> nr_zones is unsigned, so it's never < 0.
>>>>>>
>>>>>> The API we're providing doesn't require zone_idx equal the namespace's
>>>>>> nr_zones at the end, though. A subset of the total number of zones can
>>>>>> be requested here.
>>>>>>
>>>>
>>>> I did see nr_zones coming with -1; guess it is my compiler.
>>>
>>> See include/linux/blkdev.h. -1 is:
>>>
>>> #define BLK_ALL_ZONES ((unsigned int)-1)
>>>
>>> Which is documented in block/blk-zoned.c:
>>>
>>> /**
>>> * blkdev_report_zones - Get zones information
>>> * @bdev: Target block device
>>> * @sector: Sector from which to report zones
>>> * @nr_zones: Maximum number of zones to report
>>> * @cb: Callback function called for each reported zone
>>> * @data: Private data for the callback
>>> *
>>> * Description:
>>> * Get zone information starting from the zone containing @sector for at most
>>> * @nr_zones, and call @cb for each zone reported by the device.
>>> * To report all zones in a device starting from @sector, the BLK_ALL_ZONES
>>> * constant can be passed to @nr_zones.
>>> * Returns the number of zones reported by the device, or a negative errno
>>> * value in case of failure.
>>> *
>>> * Note: The caller must use memalloc_noXX_save/restore() calls to control
>>> * memory allocations done within this function.
>>> */
>>> int blkdev_report_zones(struct block_device *bdev, sector_t sector,
>>> unsigned int nr_zones, report_zones_cb cb, void *data)
>>>
>>>>
>>>>>
>>>>> Yes, absolutely. zone_idx is not an absolute zone number. It is the index of the
>>>>> reported zone descriptor in the current report range requested by the user,
>>>>> which is not necessarily for the entire drive (i.e., provided nr zones is less
>>>>> than the total number of zones of the disk and/or start sector is > 0). So
>>>>> zone_idx indicates the actual number of zones reported, it is not the total
>>>>
>>>> I see. As I can see, when nr_zones comes undefined I believed we could
>>>> assume that zone_idx is absolute, but I can be wrong.
>>>
>>> No. zone_idx is *always* the index of the zone in the current report. Whatever
>>> that report is, regardless of the report starting point and number of zones
>>> requested. E.g. For a single zone report (nr_zones = 1), you will always see
>>> zone_idx = 0. For a full report, zone_idx will correspond to the zone number.
>>> This is used for example in blk_revalidate_disk_zones() to initialize the zone
>>> bitmaps.
>>>
>>>> Does it make sense to support this check with an additional counter and
>>>> a explicit nr_zones initialization when undefined or you
>>>> prefer to just remove it as Matias suggested?
>>>
>>> The check is not needed at all.
>>>
>>> If the device is buggy and reports more zones than the device capacity or any
>>> other bugs, the driver can catch that when it processes the report.
>>> blk_revalidate_disk_zones() also has many checks.
>>
>> I have managed to create a QEMU ZNS device that gave me a headache with
>> a little bit of extra capacity that triggered an additional zone report.
>> This was the motivation for the patch.
>
>The device emulation sound buggy... If the capacity is wrong, then the report
>will be too since zones are all supposed to be sequential (no holes between
>zones) and up to the disk capacity only (last zone start + len = capacity + 1)
>
>If one or the other is wrong, this should be easy to detect. Normally,
>blk_revalidate_disk_zones() should be able to catch that.
We have the capability to select the reported device capacity manually
for a number of reasons. One of the different test configurations in our
CI did go through.
But it is OK, I will remove the check on V2.
Javier
More information about the Linux-nvme
mailing list