[PATCH] arm64: Implement ptep_set_access_flags() for hardware AF/DBM

Alexander Graf agraf at suse.de
Tue Jun 7 09:38:47 PDT 2016



On 07.06.16 18:30, Will Deacon wrote:
> Hi Alex,
> 
> Thanks for the bug report.
> 
> On Tue, Jun 07, 2016 at 04:13:03PM +0200, Alexander Graf wrote:
>> On 13.04.16 17:01, Catalin Marinas wrote:
>>> When hardware updates of the access and dirty states are enabled, the
>>> default ptep_set_access_flags() implementation based on calling
>>> set_pte_at() directly is potentially racy. This triggers the "racy dirty
>>> state clearing" warning in set_pte_at() because an existing writable PTE
>>> is overridden with a clean entry.
> 
> [...]
> 
>> This patch breaks swapping for me.
>>
>> I've hit weird issues where systems stopped working half-way, with the
>> kernel still being fine and user space applications just stopping to
>> respond.
>>
>> After some debugging we found out that it always happens when swapping
>> (to anything, backing storage doesn't matter). A quick bisect points to
>> this commit as culprit and indeed, if I disable CONFIG_ARM64_HW_AFDBM
>> the system works as expected.
> 
> [...]
> 
>> The back traces indicate that the page fault handler goes through and
>> the process just keeps banging on the same page fault over and over
>> again. I have not yet figured out what *exactly* is going wrong or why
>> this patch would actually give us that effect.
>>
>> I was able to fully reproduce the issue with current Linus tree (4.7-rc2+).
> 
> It looks like the following happens:
> 
>   1. We put down a PTE_WRITE | PTE_DIRTY | PTE_AF pte
>   2. Memory pressure -> pte_mkold(pte) -> clear PTE_AF
>   3. A read faults due to the missing access flag
>   4. ptep_set_access_flags is called with dirty = 0, due to the read fault
>   5. pte is then made PTE_WRITE | PTE_DIRTY | PTE_AF | PTE_RDONLY (!)
>   6. A write faults, but pte_write is true so we get stuck
> 
> Does the diff below fix the issue for you? If so, I'll write a proper
> patch.

With that patch on top of Linus master I don't see the issue anymore,
yes :).


Thanks for the quick reply!

Alex



More information about the linux-arm-kernel mailing list