[RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64

David Hildenbrand david at redhat.com
Tue Mar 26 10:04:45 PDT 2024


>>>>
>>>> Likely, we just want to read "the real deal" on both sides of the pte_same()
>>>> handling.
>>>
>>> Sorry I'm not sure I understand? You mean read the full pte including
>>> access/dirty? That's the same as dropping the patch, right? Of course if we do
>>> that, we still have to keep pte_get_lockless() around for this case. In an ideal
>>> world we would convert everything over to ptep_get_lockless_norecency() and
>>> delete ptep_get_lockless() to remove the ugliness from arm64.
>>
>> Yes, agreed. Patch #3 does not look too crazy and it wouldn't really affect any
>> architecture.
>>
>> I do wonder if pte_same_norecency() should be defined per architecture and the
>> default would be pte_same(). So we could avoid the mkold etc on all other
>> architectures.
> 
> Wouldn't that break it's semantics? The "norecency" of
> ptep_get_lockless_norecency() means "recency information in the returned pte may
> be incorrect". But the "norecency" of pte_same_norecency() means "ignore the
> access and dirty bits when you do the comparison".

My idea was that ptep_get_lockless_norecency() would return the actual 
result on these architectures. So e.g., on x86, there would be no actual 
change in generated code.

But yes, the documentation of these functions would have to be improved.

Now I wonder if ptep_get_lockless_norecency() should actively clear 
dirty/accessed bits to more easily find any actual issues where the bits 
still matter ...

> 
> I think you could only do the optimization you describe if you required that
> pte_same_norecency() would only be given values returned by
> ptep_get_lockless_norecency() (or ptep_get_norecency()). Even then, its not
> quite the same; if a page is accessed between gets one will return true and the
> other false.

Right.

-- 
Cheers,

David / dhildenb




More information about the linux-arm-kernel mailing list