[RFC] arm64: Enforce observed order for spinlock and data

Mon Oct 3 12:20:57 PDT 2016

On 2016-10-01 14:11, Mark Rutland wrote:
> Hi Brent,
> 
> Evidently my questions weren't sufficiently clear; even with your 
> answers it's
> not clear to me what precise issue you're attempting to solve. I've 
> tried to be
> more specific this time.
> 
> At a high-level, can you clarify whether you're attempting to solve is:
> 
> (a) a functional correctness issue (e.g. data corruption)
> (b) a performance issue
> 
> And whether this was seen in practice, or found through code 
> inspection?
> 
> On Sat, Oct 01, 2016 at 12:11:36PM -0400, bdegraaf at codeaurora.org 
> wrote:
>> On 2016-09-30 15:32, Mark Rutland wrote:
>> >On Fri, Sep 30, 2016 at 01:40:57PM -0400, Brent DeGraaf wrote:
>> >>+	 * so LSE needs an explicit barrier here as well.  Without this, the
>> >>+	 * changed contents of the area protected by the spinlock could be
>> >>+	 * observed prior to the lock.
>> >>+	 */
>> >
>> >By whom? We generally expect that if data is protected by a lock, you take
>> >the lock before accessing it. If you expect concurrent lockless readers,
>> >then there's a requirement on the writer side to explicitly provide the
>> >ordering it requires -- spinlocks are not expected to provide that.
>> 
>> More details are in my response to Robin, but there is an API arm64 
>> supports
>> in spinlock.h which is used by lockref to determine whether a lock is 
>> free or
>> not.  For that code to work properly without adding these barriers, 
>> that API
>> needs to take the lock.
> 
> Can you please provide a concrete example of a case where things go 
> wrong,
> citing the code (or access pattern) in question? e.g. as in the commit 
> messages
> for:
> 
>   8e86f0b409a44193 ("arm64: atomics: fix use of acquire + release for
> full barrier semantics)
>   859b7a0e89120505 ("mm/slub: fix lockups on PREEMPT && !SMP kernels")
> 
> (for the latter, I messed up the register -> var mapping in one 
> paragraph, but
> the style and reasoning is otherwise sound).
> 
> In the absence of a concrete example as above, it's very difficult to 
> reason
> about what the underlying issue is, and what a valid fix would be for 
> said
> issue.
> 
>> >What pattern of accesses are made by readers and writers such that there is
>> >a problem?
> 
> Please note here I was asking specifically w.r.t. the lockref code, 
> e.g. which
> loads could see stale data, and what does the code do based on this 
> value such
> that there is a problem.
> 
>> I added the barriers to the readers/writers because I do not know 
>> these are
>> not similarly abused.  There is a lot of driver code out there, and 
>> ensuring
>> order is the safest way to be sure we don't get burned by something 
>> similar
>> to the lockref access.
> 
> Making the architecture-provided primitives overly strong harms 
> performance and
> efficiency (in general), makes the code harder to maintain and optimise 
> in
> future, and only masks the issue (which could crop up on other 
> architectures,
> for instance).
> 
> Thanks,
> Mark.

Thinking about this, as the reader/writer code has no known "abuse" 
case, I'll
remove it from the patchset, then provide a v2 patchset with a detailed
explanation for the lockref problem using the commits you provided as an 
example,
as well as performance consideration.

Brent