[RFC PATCH] arm64: neon: Remove support for nested or hardirq kernel-mode NEON
Dave Martin
Dave.Martin at arm.com
Fri May 19 07:02:40 PDT 2017
On Fri, May 19, 2017 at 02:34:25PM +0100, Mark Rutland wrote:
> On Fri, May 19, 2017 at 02:13:21PM +0100, Dave Martin wrote:
> > On Fri, May 19, 2017 at 01:49:53PM +0100, Mark Rutland wrote:
> > > Hi,
> > >
> > > On Fri, May 19, 2017 at 12:26:39PM +0100, Dave Martin wrote:
> > > > +static bool __maybe_unused kernel_neon_allowed(void)
> > > > +{
> > > > + /*
> > > > + * The per_cpu_ptr() is racy if called with preemption enabled.
> > > > + * This is not a bug: per_cpu(kernel_neon_busy) is only set
> > > > + * when preemption is disabled, so we cannot migrate to another
> > > > + * CPU while it is set, nor can we migrate to a CPU where it is set.
> > > > + * So, if we find it clear on some CPU then we're guaranteed to find it
> > > > + * clear on any CPU we could migrate to.
> > > > + *
> > > > + * If we are in between kernel_neon_begin()...kernel_neon_end(), the
> > > > + * flag will be set, but preemption is also disabled, so we can't
> > > > + * migrate to another CPU and spuriously see it become false.
> > > > + */
> > > > + return !(in_irq() || in_nmi()) &&
> > > > + !local_read(this_cpu_ptr(&kernel_neon_busy));
> > > > +}
> > >
> > > I think it would be better to use the this_cpu ops for this, rather than
> > > manually messing with the pointer.
> >
> > I had thought that the properties of local_t were important here, but
> > this_cpu ops seem equally appropriate.
> >
> > > Here, we can use raw_cpu_read(kernel_neon_busy), given the comment
> > > above.
> >
> > What's the difference? The raw_ variants aren't documented. Do these
> > not bother about atomicity between the address calculation and the read?
>
> Yup.
>
> Comparing raw_cpu_{x}, __this_cpu_{x}, and this_cpu_{x}:
>
> * raw_cpu_{x} don't have any preemption checks, and don't disable
> preemption internally. Due to this, the address gen and operation can
> occur on different CPUs.
Ah, I see. This is not fantastically intuitive...
> * __this_cpu_{x} have lockdep annotations to check that preemption is
> disabled, but otherwise behave the same as raw_cpu_{x}. i.e. they
> don't try to disable preemption internally.
>
> * this_cpu_{x} ensure that preemption is disabled around the address gen
> and operation.
>
> Since preemption may be possible, but we believe this is fine, we have
> to use the raw form. We'll want to keep the commment explaining why.
Absolutely. Those seem natural fits then.
Cheers
---Dave
More information about the linux-arm-kernel
mailing list