[PATCH] psi: reduce min window size to 50ms

Josh Hunt johunt at akamai.com
Mon Feb 27 11:19:00 PST 2023



On 2/27/23 9:49 AM, Suren Baghdasaryan wrote:
> On Mon, Feb 27, 2023 at 5:34 AM Michal Hocko <mhocko at suse.com> wrote:
>>
>> On Fri 24-02-23 13:07:57, Suren Baghdasaryan wrote:
>>> On Fri, Feb 24, 2023 at 4:47 AM Michal Hocko <mhocko at suse.com> wrote:
>>>>
>>>> On Tue 14-02-23 11:34:30, Suren Baghdasaryan wrote:
>>>> [...]
>>>>> Your suggestion to have this limit configurable sounds like obvious
>>>>> solution. I would like to get some opinions from other maintainers.
>>>>> Johannes, WDYT? CC'ing Michal to chime in as well since this is mostly
>>>>> related to memory stalls.
>>>>
>>>> I do not think that making this configurable helps much. Many users will
>>>> be bound to distribution config and also it would be hard to experiment
>>>> with a recompile cycle every time. This seems just too impractical.
>>>>
>>>> Is there any reason why we shouldn't allow any timeout? Shorter
>>>> timeouts could be restricted to a priviledged context to avoid an easy
>>>> way to swamp system by too frequent polling.
>>>
>>> Hmm, ok. Maybe then we just ensure that only privileged users can set
>>> triggers and remove the min limit (use a >0 check)?
>>
>> This could break existing userspace which is not privileged. I would
>> just go with CAP_SYS_NICE or similar with small (sub min) timeouts.
> 
> Yeah, that's what I meant. /proc/pressure/* files already check for
> CAP_SYS_RESOURCE
> (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/latest/source/kernel/sched/psi.c*L1440__;Iw!!GjvTz_vk!WtI61poYlZk9kg5P1sX19RdYnUNGvBJRjnOpu8hL6IOZ_NKhuw2qZ_tAdNRwzZoQVlO4jEObYN6x$ )
> but per-cgroup pressure files do not have this check. I think the
> original patch which added this check
> (https://urldefense.com/v3/__https://lore.kernel.org/all/20210402025833.27599-1-johunt@akamai.com/__;!!GjvTz_vk!WtI61poYlZk9kg5P1sX19RdYnUNGvBJRjnOpu8hL6IOZ_NKhuw2qZ_tAdNRwzZoQVlO4jAVqIVDv$ )
> missed the cgroup ones. This should be easy to add but I wonder if
> that was left that way intentionally.
> 
> CC'ing the author. Josh, Johannes is that inconsistency between system
> pressure files and cgroup-specific ones intentional? Can we change
> them all to check for CAP_SYS_RESOURCE?

No, this was just an oversight in the original patch at least from my
end, and did not come up during code review. Fine with me to change them
all to use CAP_SYS_RESOURCE.

Josh

> 
>>
>>>> Btw. it seems that there is is only a limit on a single trigger per fd
>>>> but no limits per user so it doesn't sound too hard to end up with too
>>>> much polling even with a larger timeouts. To me it seems like we need to
>>>> contain the polling thread to be bound by the cpu controller.
>>>
>>> Hmm. We have one "psimon" thread per cgroup (+1 system-level one) and
>>> poll_min_period for each thread is chosen as the min() of polling
>>> periods between triggers created in that group. So, a bad trigger that
>>> causes overly aggressive polling and polling thread being throttled,
>>> might affect other triggers in that cgroup.
>>
>> Yes, and why that would be a problem?
> 
> If unprivileged processes are allowed to add new triggers then a
> malicious process can add a bad trigger and affect other legit
> processes. That sounds like a problem to me.
> Thanks,
> Suren.
> 
>> --
>> Michal Hocko
>> SUSE Labs



More information about the linux-arm-kernel mailing list