[PATCH v10 1/1] nvmet: support reservation feature
Dmitry Bogdanov
d.bogdanov at yadro.com
Wed Sep 25 07:21:57 PDT 2024
On Wed, Sep 25, 2024 at 10:11:10AM +0800, Guixin Liu wrote:
>
> This patch implements the reservation feature, includes:
> 1. reservation register(register, unregister and replace).
> 2. reservation acquire(acquire, preempt, preempt and abort).
> 3. reservation release(release and clear).
> 4. reservation report.
> 5. set feature and get feature of reservation notify mask.
> 6. get log page of reservation event.
>
> And also make reservation configurable, one can set ns to support
> reservation before enable ns. The default of resv_enable is false.
>
> Signed-off-by: Guixin Liu <kanie at linux.alibaba.com>
> ---
> drivers/nvme/target/Makefile | 2 +-
> drivers/nvme/target/admin-cmd.c | 24 +-
> drivers/nvme/target/configfs.c | 27 +
> drivers/nvme/target/core.c | 58 +-
> drivers/nvme/target/nvmet.h | 48 ++
> drivers/nvme/target/pr.c | 1197 +++++++++++++++++++++++++++++++
> include/linux/nvme.h | 54 ++
> 7 files changed, 1401 insertions(+), 9 deletions(-)
> create mode 100644 drivers/nvme/target/pr.c
>
> +static void nvmet_execute_pr_report(struct nvmet_req *req)
> +{
> + u32 cdw11 = le32_to_cpu(req->cmd->common.cdw11);
> + u32 cdw10 = le32_to_cpu(req->cmd->common.cdw10);
> + u32 num_bytes = 4 * (cdw10 + 1); /* cdw10 is number of dwords */
> + u8 eds = cdw11 & 1; /* Extended data structure, bit 00 */
> + struct nvmet_subsys *subsys = req->sq->ctrl->subsys;
> + struct nvme_registered_ctrl_ext *ctrl_eds;
> + struct nvme_reservation_status_ext *data;
> + struct nvmet_pr *pr = &req->ns->pr;
> + struct nvmet_pr_registrant *holder;
> + struct nvmet_pr_registrant *reg;
> + struct nvmet_ctrl *ctrl;
> + u16 num_ctrls = 0;
> + u16 status;
> + u8 rtype;
> +
> + /* nvmet hostid(uuid_t) is 128 bit. */
> + if (!eds) {
> + req->error_loc = offsetof(struct nvme_common_command, cdw11);
> + status = NVME_SC_HOST_ID_INCONSIST | NVME_SC_DNR;
> + goto out;
> + }
> +
> + if (num_bytes < sizeof(struct nvme_reservation_status_ext)) {
> + req->error_loc = offsetof(struct nvme_common_command, cdw10);
> + status = NVME_SC_INVALID_FIELD | NVME_SC_DNR;
> + goto out;
> + }
> +
> + mutex_lock(&subsys->lock);
> + num_ctrls = list_count_nodes(&subsys->ctrls);
> + mutex_unlock(&subsys->lock);
> +
> + num_bytes = min(num_bytes,
> + sizeof(struct nvme_reservation_status_ext) +
> + sizeof(struct nvme_registered_ctrl_ext) * num_ctrls);
> +
> + data = kmalloc(num_bytes, GFP_KERNEL);
> + if (!data) {
> + status = NVME_SC_INTERNAL;
> + goto out;
> + }
> + memset(data, 0, num_bytes);
> + data->gen = cpu_to_le32(atomic_read(&pr->generation));
> + data->ptpls = 0;
> + ctrl_eds = data->regctl_eds;
> +
> + mutex_lock(&subsys->lock);
> + rcu_read_lock();
> + holder = rcu_dereference(pr->holder);
> + rtype = holder ? holder->rtype : 0;
> + data->rtype = rtype;
> + num_ctrls = 0;
> + list_for_each_entry(ctrl, &subsys->ctrls, subsys_entry) {
> + if ((void *)ctrl_eds >= (void *)(data + num_bytes))
> + break;
> + reg = nvmet_pr_find_registrant(pr, &ctrl->hostid);
> + if (!reg)
> + continue;
> + ctrl_eds->cntlid = cpu_to_le16(ctrl->cntlid);
> + if (rtype == NVME_PR_WRITE_EXCLUSIVE_ALL_REGS ||
> + rtype == NVME_PR_EXCLUSIVE_ACCESS_ALL_REGS)
> + ctrl_eds->rcsts = 1;
> + if (holder && uuid_equal(&ctrl->hostid, &holder->hostid))
> + ctrl_eds->rcsts = 1;
> + uuid_copy((uuid_t *)&ctrl_eds->hostid, &ctrl->hostid);
> + ctrl_eds->rkey = cpu_to_le64(reg->rkey);
> + ctrl_eds++;
> + num_ctrls++;
> + }
> + rcu_read_unlock();
> + mutex_unlock(&subsys->lock);
That code fully confroms NVMe Base Spec 2.0c but it is completely wrong
from the logic point of view.
ControllerId is a dynamic thing - it's allocated for the each
fabric session(Connect command). It has no meaning for an initiator.
The association between Registrant (uniq key HostId) and Controller
(uniq key CtrlId) is not 1..1, it is 1..N.
NVMe Base Spec 2.1 addressed this issue and it states, that here target
shall report just list if the registrants:
Number of Registrants (REGSTRNT): This field indicates the number of registrants of the
namespace. This indicates the number of Registrant data structures or Registrant Extended data
structures contained in this data structure.
Note: This field was formerly named Number of Registered Controllers (REGCTL).
Though, v2.1 still does not clearly states what exact CNTLID is to be
reported from 1..N association.
I suggest you to update your patch to conform the Spec v2.1 - remove cntlid
from nvmet_pr_registrant and report here just ns->pr.registrant_list.
And not to track the connectivity of registrants - report always CNTLID=0xFFFF
as a some compromise value that is allowed by the Spec.
> +
> + put_unaligned_le16(num_ctrls, data->regctl);
> + num_bytes = sizeof(struct nvme_reservation_status_ext) +
> + sizeof(struct nvme_registered_ctrl_ext) * num_ctrls;
num_bytes was a size of requested buffer (cdw10), you even allocate that
size. You shall not extend it here. Instead you should allocate maximum
possible size and copy here only cdw10 dwords from your buffer to sgl.
> + status = nvmet_copy_to_sgl(req, 0, data, num_bytes);
> + kfree(data);
> +out:
> + nvmet_req_complete(req, status);
> +}
> +
BR,
Dmitry
More information about the Linux-nvme
mailing list