[PATCH RT v2 0/3] riscv: add PREEMPT_RT support

Fri Nov 3 11:22:40 PDT 2023

On Fri, Nov 3, 2023 at 10:39 AM Sebastian Andrzej Siewior
<bigeasy at linutronix.de> wrote:
>
> On 2023-11-03 10:19:49 [-0700], Evan Green wrote:
> > Hi Sebastien,
>
> Hi Evan,
>
> > Could you elaborate a little on the rule violation here? The
> > documentation for GFP_NOWAIT says:
> >
> > > If the allocation is performed from an atomic context, e.g interrupt
> > > handler, use ``GFP_NOWAIT``.
> >
> > which seems like it basically fits my situation. If I do a search for
> > GFP_NOWAIT to hunt for instances of allocating with interrupts
> > disabled, I see at least the following examples (more available upon
> > request):
>
> We talk here about PREEMPT_RT. The early-CPU up or a SMP-function calls
> happens in hardirq context with disabled interrupts. Always.
> The sequence
>         spin_lock_irq();
>         kmalloc(, GFP_ATOMIC);
>
> is fine because on PREEMPT_RT spin_lock_irq() does not really disable
> interrupts and the spin_lock_t becomes a sleeping lock. See
>         https://docs.kernel.org/locking/locktypes.html
>

Thanks, that page helps. PREEMPT_RT is wild :)

> The problem is that sleeping locks must be acquired in a context where
> sleeping is not possible. And kmalloc() may need to acquire sleeping
> locks. Therefore no memory allocations in IRQ-off sections.
>
> > Finding the documentation that states this is illegal might help me
> > understand what I should be doing instead. For example, I'm fuzzy on
> > something that's disallowed when interrupts are disabled but ok in
> > smp_callin().
>
> smp_callin() is the SMP functions call so it is not okay. Having a
> kworker would okay for instance. This however requires a fully setup
> scheduler. Having the memory allocated upfront and passing to every CPU
> would work fine, too.
>
> > One option is to do all the allocations in
> > check_unaligned_access_all_cpus(), and pass them in, but until I can
> > find the rules it's hard to verify that's a correct fix. It's also a
> > little clunky, and wouldn't apply to the hotplug path, so I want to
> > make sure it's necessary and sufficient before embarking on that
> > journey.
>
> I don't know what it does and how often it needs to be done. The
> question is would it be okay to do once on system boot or does it need
> to be done each time the CPU goes up? Is it required to be the first
> thing it does or can it be delayed to slightly later point in time?

It's essentially a characterization we do on each CPU to figure out if
misaligned word accesses are fast or slow. This info is reported to
usermode, and potentially has kernel consumers as well (Charlie's ip
checksumming series wants to flip a static branch based on it).

In a sense I barely need the buffer, the buffer is just a runway I use
to do dummy memcpys across. I need to do it once on boot for each CPU,
and for any additional CPUs that show up via hotplug. It does not need
to be done across suspend/resume.

The patch in question was an attempt to move this work to be done in
parallel, since on systems with lots of CPUs, doing it serially in
smp_callin() was causing boot time regressions.

For boot, I think the plan I outlined should work: allocate all pages
in the initcall before calling on_each_cpu(). For hotplug, I'll have
to find a spot to allocate a buffer before the CPU comes up, so it can
use it in smp_callin().

Palmer's dropped this patch for now, so I'll plan to spin with those
changes unless anyone has thoughts on this approach.
-Evan