[PATCH] nvmet: allow associating port to a cgroup via configfs
Chaitanya Kulkarni
chaitanyak at nvidia.com
Tue Jun 27 19:33:48 PDT 2023
On 6/27/23 14:13, Sagi Grimberg wrote:
> Hey Ofir,
>
>> Currently there is no way to throttle nvme targets with cgroup v2.
>
> How do you do it with v1?
>
>> The IOs that the nvme target submits lack associating to a cgroup,
>> which makes them act as root cgroup. The root cgroup can't be throttled
>> with the cgroup v2 mechanism.
>
> What happens to file or passthru backends? You paid attention just to
> bdev. I don't see how this is sanely supported with files. It's possible
> if you convert nvmet to use its own dedicated kthreads and infer the
> cg from the kthread. That presents a whole other set of issues.
>
if we are doing it for one back-end we cannot leave other back-ends out ...
> Maybe the cleanest way to implement something like this is to implement
> a full blown nvmet cgroup controller that you can apply a whole set of
> resources to, in addition to I/O.
>
>> Signed-off-by: Ofir Gal <ofir.gal at volumez.com>
>> ---
>> drivers/nvme/target/configfs.c | 77 +++++++++++++++++++++++++++++++
>> drivers/nvme/target/core.c | 3 ++
>> drivers/nvme/target/io-cmd-bdev.c | 13 ++++++
>> drivers/nvme/target/nvmet.h | 3 ++
>> include/linux/cgroup.h | 5 ++
>> kernel/cgroup/cgroup-internal.h | 5 --
>
> Don't mix cgroup and nvmet changes in the same patch.
>
>> 6 files changed, 101 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/nvme/target/configfs.c
>> b/drivers/nvme/target/configfs.c
>> index 907143870da5..2e8f93a07498 100644
>> --- a/drivers/nvme/target/configfs.c
>> +++ b/drivers/nvme/target/configfs.c
>> @@ -12,6 +12,7 @@
>> #include <linux/ctype.h>
>> #include <linux/pci.h>
>> #include <linux/pci-p2pdma.h>
>> +#include <linux/cgroup.h>
>> #ifdef CONFIG_NVME_TARGET_AUTH
>> #include <linux/nvme-auth.h>
>> #endif
>> @@ -281,6 +282,81 @@ static ssize_t
>> nvmet_param_pi_enable_store(struct config_item *item,
>> CONFIGFS_ATTR(nvmet_, param_pi_enable);
>> #endif
>> +static ssize_t nvmet_param_associated_cgroup_show(struct
>> config_item *item,
>> + char *page)
>> +{
>> + struct nvmet_port *port = to_nvmet_port(item);
>> + ssize_t len = 0;
>> + ssize_t retval;
>> + char *suffix;
>> +
>> + /* No cgroup has been set means the IOs are assoicated to the
>> root cgroup */
>> + if (!port->cgrp)
>> + goto root_cgroup;
>> +
>> + retval = cgroup_path_ns(port->cgrp, page, PAGE_SIZE,
>> + current->nsproxy->cgroup_ns);
>> + if (retval >= PATH_MAX || retval >= PAGE_SIZE)
>> + return -ENAMETOOLONG;
>> +
>> + /* No cgroup found means the IOs are assoicated to the root
>> cgroup */
>> + if (retval < 0)
>> + goto root_cgroup;
>> +
>> + len += retval;
>> +
>> + suffix = cgroup_is_dead(port->cgrp) ? " (deleted)\n" : "\n";
>> + len += snprintf(page + len, PAGE_SIZE - len, suffix);
>> +
>> + return len;
>> +
>> +root_cgroup:
>> + return snprintf(page, PAGE_SIZE, "/\n");
>> +}
>> +
>> +static ssize_t nvmet_param_associated_cgroup_store(struct
>> config_item *item,
>> + const char *page, size_t count)
>> +{
>> + struct nvmet_port *port = to_nvmet_port(item);
>> + struct cgroup_subsys_state *blkcg;
>> + ssize_t retval = -EINVAL;
>> + struct cgroup *cgrp;
>> + char *path;
>> + int len;
>> +
>> + len = strcspn(page, "\n");
>> + if (!len)
>> + return -EINVAL;
>> +
>> + path = kmemdup_nul(page, len, GFP_KERNEL);
>> + if (!path)
>> + return -ENOMEM;
>> +
>> + cgrp = cgroup_get_from_path(path);
>> + kfree(path);
>> + if (IS_ERR(cgrp))
>> + return -ENOENT;
>> +
>> + blkcg = cgroup_get_e_css(cgrp, &io_cgrp_subsys);
>> + if (!blkcg)
>> + goto out_put_cgroup;
>> +
>> + /* Put old cgroup */
>> + if (port->cgrp)
>> + cgroup_put(port->cgrp);
>> +
>> + port->cgrp = cgrp;
>> + port->blkcg = blkcg;
>> +
>> + return count;
>> +
>> +out_put_cgroup:
>> + cgroup_put(cgrp);
>> + return retval;
>> +}
>
> I'm not at all convinced that nvmet ratelimiting does not
> require a dedicated cgroup controller... Rgardles, this doesn't
> look like a port attribute, its a subsystem attribute.
+1 here, can you please explain the choice of port ?
also for cgroup related changes I think you need to CC Tejun who's
the author of the cgroup_is_dead() and whatever the right mailing list ..
-ck
More information about the Linux-nvme
mailing list