[PATCH 4/6] nvme: zns: Add support for power_of_2 emulation to NVMe ZNS devices

Damien Le Moal damien.lemoal at opensource.wdc.com
Wed Mar 9 13:43:47 PST 2022


On 3/9/22 23:33, Pankaj Raghav wrote:
> 
> 
> On 2022-03-09 05:04, Damien Le Moal wrote:
>> On 3/9/22 01:53, Pankaj Raghav wrote:
>  
>> Contradiction: reducing the impact of performance regression is not the
>> same as "does not create a performance regression". So which one is it ?
>> Please add performance numbers to this commit message.
> 
>> And also please actually explain what the patch is changing. This commit
>> message is about the why, but nothing on the how.
>>
> I will reword and add a bit more context to the commit log with perf numbers
> in the next revision
>>> +EXPORT_SYMBOL_GPL(nvme_fail_po2_zone_emu_violation);
>>> +
>>
>> This should go in zns.c, not in the core.
>>
> Ok.
> 
>>> +
>>> +static void nvme_npo2_zone_setup(struct gendisk *disk)
>>> +{
>>> +	nvme_ns_po2_zone_emu_setup(disk->private_data);
>>> +}
>>
>> This helper seems useless.
>>
> I tried to retain the pattern with report_zones which is currently like this:
> static int nvme_report_zones(struct gendisk *disk, sector_t sector,
> 		unsigned int nr_zones, report_zones_cb cb, void *data)
> {
> 	return nvme_ns_report_zones(disk->private_data, sector, nr_zones, cb,
> 			data);
> }
> 
> But I can just remove this helper and use nvme_ns_po2_zone_emu_setup cb directly
> in nvme_bdev_fops.
> 
>>> +
>>>  /*
>>>   * Convert a 512B sector number to a device logical block number.
>>>   */
>>>  static inline u64 nvme_sect_to_lba(struct nvme_ns *ns, sector_t sector)
>>>  {
>>> -	return sector >> (ns->lba_shift - SECTOR_SHIFT);
>>> +#ifdef CONFIG_BLK_DEV_ZONED
>>> +	return ns->sect_to_lba(ns, sector);
>>
>> So for a power of 2 zone sized device, you are forcing an indirect call,
>> always. Not acceptable. What is the point of that po2_zone_emu boolean
>> you added to the queue ?
> This is a good point and we had a discussion about this internally.
> Initially I had something like this:
> if (!blk_queue_is_po2_zone_emu(disk))
> 	return sector >> (ns->lba_shift - SECTOR_SHIFT);
> else
> 	return __nvme_sect_to_lba_po2(ns, sec);

No need for the else.

> 
> But @Luis indicated that it was better to set an op which comes at a cost of indirection
> instead of having a runtime check with a if/else in the **hot path**. The code also looks
> more clear with having an op.

The indirect call using a function pointer makes the code obscure. And
the cost of that call is far greater and always present compared to the
CPU branch prediction which will luckily avoid most of the time taking
the wrong branch of an if.

> 
> The performance analysis that we performed did not show any regression while using the indirection
> for po2 zone sizes, at least on the x86_64 architecture.
> So maybe we could use this opportunity to discuss which approach could be used here.

The easiest one that makes the code clear and easy to understand.



-- 
Damien Le Moal
Western Digital Research



More information about the Linux-nvme mailing list