[PATCH] nvmet: allow associating port to a cgroup via configfs

Chaitanya Kulkarni chaitanyak at nvidia.com
Tue Jun 27 19:33:48 PDT 2023


On 6/27/23 14:13, Sagi Grimberg wrote:
> Hey Ofir,
>
>> Currently there is no way to throttle nvme targets with cgroup v2.
>
> How do you do it with v1?
>
>> The IOs that the nvme target submits lack associating to a cgroup,
>> which makes them act as root cgroup. The root cgroup can't be throttled
>> with the cgroup v2 mechanism.
>
> What happens to file or passthru backends? You paid attention just to
> bdev. I don't see how this is sanely supported with files. It's possible
> if you convert nvmet to use its own dedicated kthreads and infer the
> cg from the kthread. That presents a whole other set of issues.
>

if we are doing it for one back-end we cannot leave other back-ends out ...

> Maybe the cleanest way to implement something like this is to implement
> a full blown nvmet cgroup controller that you can apply a whole set of
> resources to, in addition to I/O.
>
>> Signed-off-by: Ofir Gal <ofir.gal at volumez.com>
>> ---
>>   drivers/nvme/target/configfs.c    | 77 +++++++++++++++++++++++++++++++
>>   drivers/nvme/target/core.c        |  3 ++
>>   drivers/nvme/target/io-cmd-bdev.c | 13 ++++++
>>   drivers/nvme/target/nvmet.h       |  3 ++
>>   include/linux/cgroup.h            |  5 ++
>>   kernel/cgroup/cgroup-internal.h   |  5 --
>
> Don't mix cgroup and nvmet changes in the same patch.
>
>>   6 files changed, 101 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/nvme/target/configfs.c 
>> b/drivers/nvme/target/configfs.c
>> index 907143870da5..2e8f93a07498 100644
>> --- a/drivers/nvme/target/configfs.c
>> +++ b/drivers/nvme/target/configfs.c
>> @@ -12,6 +12,7 @@
>>   #include <linux/ctype.h>
>>   #include <linux/pci.h>
>>   #include <linux/pci-p2pdma.h>
>> +#include <linux/cgroup.h>
>>   #ifdef CONFIG_NVME_TARGET_AUTH
>>   #include <linux/nvme-auth.h>
>>   #endif
>> @@ -281,6 +282,81 @@ static ssize_t 
>> nvmet_param_pi_enable_store(struct config_item *item,
>>   CONFIGFS_ATTR(nvmet_, param_pi_enable);
>>   #endif
>>   +static ssize_t nvmet_param_associated_cgroup_show(struct 
>> config_item *item,
>> +        char *page)
>> +{
>> +    struct nvmet_port *port = to_nvmet_port(item);
>> +    ssize_t len = 0;
>> +    ssize_t retval;
>> +    char *suffix;
>> +
>> +    /* No cgroup has been set means the IOs are assoicated to the 
>> root cgroup */
>> +    if (!port->cgrp)
>> +        goto root_cgroup;
>> +
>> +    retval = cgroup_path_ns(port->cgrp, page, PAGE_SIZE,
>> +                         current->nsproxy->cgroup_ns);
>> +    if (retval >= PATH_MAX || retval >= PAGE_SIZE)
>> +        return -ENAMETOOLONG;
>> +
>> +    /* No cgroup found means the IOs are assoicated to the root 
>> cgroup */
>> +    if (retval < 0)
>> +        goto root_cgroup;
>> +
>> +    len += retval;
>> +
>> +    suffix = cgroup_is_dead(port->cgrp) ? " (deleted)\n" : "\n";
>> +    len += snprintf(page + len, PAGE_SIZE - len, suffix);
>> +
>> +    return len;
>> +
>> +root_cgroup:
>> +    return snprintf(page, PAGE_SIZE, "/\n");
>> +}
>> +
>> +static ssize_t nvmet_param_associated_cgroup_store(struct 
>> config_item *item,
>> +        const char *page, size_t count)
>> +{
>> +    struct nvmet_port *port = to_nvmet_port(item);
>> +    struct cgroup_subsys_state *blkcg;
>> +    ssize_t retval = -EINVAL;
>> +    struct cgroup *cgrp;
>> +    char *path;
>> +    int len;
>> +
>> +    len = strcspn(page, "\n");
>> +    if (!len)
>> +        return -EINVAL;
>> +
>> +    path = kmemdup_nul(page, len, GFP_KERNEL);
>> +    if (!path)
>> +        return -ENOMEM;
>> +
>> +    cgrp = cgroup_get_from_path(path);
>> +    kfree(path);
>> +    if (IS_ERR(cgrp))
>> +        return -ENOENT;
>> +
>> +    blkcg = cgroup_get_e_css(cgrp, &io_cgrp_subsys);
>> +    if (!blkcg)
>> +        goto out_put_cgroup;
>> +
>> +    /* Put old cgroup */
>> +    if (port->cgrp)
>> +        cgroup_put(port->cgrp);
>> +
>> +    port->cgrp = cgrp;
>> +    port->blkcg = blkcg;
>> +
>> +    return count;
>> +
>> +out_put_cgroup:
>> +    cgroup_put(cgrp);
>> +    return retval;
>> +}
>
> I'm not at all convinced that nvmet ratelimiting does not
> require a dedicated cgroup controller... Rgardles, this doesn't
> look like a port attribute, its a subsystem attribute.

+1 here, can you please explain the choice of port ?

also for cgroup related changes I think you need to CC Tejun who's
the author of the cgroup_is_dead() and whatever the right mailing list ..

-ck




More information about the Linux-nvme mailing list