[PATCH V4 2/9] nvmet: add ZNS support for bdev-ns
Christoph Hellwig
hch at lst.de
Wed Dec 2 04:07:15 EST 2020
On Tue, Dec 01, 2020 at 10:22:20PM -0800, Chaitanya Kulkarni wrote:
> Add zns-bdev-config, id-ctrl, id-ns, zns-cmd-effects, zone-mgmt-send,
> zone-mgmt-recv and zone-append handlers
If you think you need to speel out command please do so as in the spec
instead of uisng strange abbreviations.
That being said the commit log should document the why and the overall
architecture considerations, not low-level details like this.
> +#ifdef CONFIG_BLK_DEV_ZONED
> + struct nvme_id_ns_zns id_zns;
> + unsigned int zasl;
> +#endif
This wastes a lot of space for all normal non-zns uses. Please have some
sort of private data field that zns can use. Why do we even need to store
the whole data structure instead of the few native format fields we need?
> static inline struct nvmet_ns *to_nvmet_ns(struct config_item *item)
> @@ -251,6 +255,10 @@ struct nvmet_subsys {
> unsigned int admin_timeout;
> unsigned int io_timeout;
> #endif /* CONFIG_NVME_TARGET_PASSTHRU */
> +
> +#ifdef CONFIG_BLK_DEV_ZONED
> + struct nvme_id_ctrl_zns id_ctrl_zns;
> +#endif
Same here.
> +#define NVMET_MPSMIN_SHIFT 12
Needs some documentation on the why.
> +static inline struct block_device *nvmet_bdev(struct nvmet_req *req)
> +{
> + return req->ns->bdev;
> +}
I don't really see the point of this helper.
> +/*
> + * ZNS related command implementation and helpers.
> + */
> +
> +u16 nvmet_process_zns_cis(struct nvmet_req *req, off_t *off)
> +{
> + u16 nvme_cis_zns = NVME_CSI_ZNS;
> +
> + if (!bdev_is_zoned(nvmet_bdev(req)))
> + return NVME_SC_SUCCESS;
> +
> + return nvmet_copy_ns_identifier(req, NVME_NIDT_CSI, NVME_NIDT_CSI_LEN,
> + &nvme_cis_zns, off);
> +}
This looks weird. We can want to support the command set identifier in
general, so this should go into common code, and just look up the command
set identifier in the nvmet_ns structure.
> +void nvmet_zns_add_cmd_effects(struct nvme_effects_log *log)
> +{
> + log->iocs[nvme_cmd_zone_append] = cpu_to_le32(1 << 0);
> + log->iocs[nvme_cmd_zone_mgmt_send] = cpu_to_le32(1 << 0);
> + log->iocs[nvme_cmd_zone_mgmt_recv] = cpu_to_le32(1 << 0);
> +}
Just add this to the caller under an if. For the if a helper that checks
IS_ENABLED() and the csi would be useful.
> +
> +static inline bool nvmet_bdev_validate_zns_zones(struct nvmet_ns *ns)
> +{
> + if (ns->bdev->bd_disk->queue->conv_zones_bitmap) {
> + pr_err("block devices with conventional zones are not supported.");
> + return false;
> + }
> +
> + return !(get_capacity(ns->bdev->bd_disk) &
> + (bdev_zone_sectors(ns->bdev) - 1));
> +}
I think this should be open coded in the caller.
> +static inline u8 nvmet_zasl(unsigned int zone_append_sects)
> +{
> + /*
> + * Zone Append Size Limit is the value experessed in the units
> + * of minimum memory page size (i.e. 12) and is reported power of 2.
> + */
> + return ilog2((zone_append_sects << 9) >> NVMET_MPSMIN_SHIFT);
> +}
> +
> +static inline void nvmet_zns_update_zasl(struct nvmet_ns *ns)
> +{
> + struct request_queue *q = ns->bdev->bd_disk->queue;
> + struct nvmet_ns *ins;
> + unsigned long idx;
> + u8 min_zasl;
> +
> + /*
> + * Calculate new ctrl->zasl value when enabling the new ns. This value
> + * has to be the minimum of the max_zone_append values from available
> + * namespaces.
> + */
> + min_zasl = ns->zasl = nvmet_zasl(queue_max_zone_append_sectors(q));
> +
> + xa_for_each(&(ns->subsys->namespaces), idx, ins) {
> + struct request_queue *iq = ins->bdev->bd_disk->queue;
> + unsigned int imax_za_sects = queue_max_zone_append_sectors(iq);
> + u8 izasl = nvmet_zasl(imax_za_sects);
> +
> + if (!bdev_is_zoned(ins->bdev))
> + continue;
> +
> + min_zasl = min_zasl > izasl ? izasl : min_zasl;
> + }
> +
> + ns->subsys->id_ctrl_zns.zasl = min_zasl;
> +}
This will change the limit when a new namespaces is added. I think we need
to just pick the value of the first namespaces and refuse adding a new
one if the limit is lower to not completely break hosts.
> +void nvmet_bdev_execute_zone_append(struct nvmet_req *req)
This whole function looks weird. I'd expect that we mostly (if not
entirely) reuse nvmet_bdev_execute_rw, just using bio_add_hw_page
instead of bio_add_page and setting up the proper op field.
> +#else /* CONFIG_BLK_DEV_ZONED */
We really do try to avoid these kinds of ifdefs in .c files. Either
put the stubs inline into the header, of use IS_ENABLED() magic in
the callers. In this case I think the new helper I mentioned above
which checks IS_ENABLED + the csi seems like the right way.
More information about the Linux-nvme
mailing list