Race condition observed between page migration and page fault handling on arm64 machines

David Hildenbrand david at redhat.com
Thu Aug 1 06:15:31 PDT 2024


>> What I am still missing is why this is (a) arm64 only; and (b) if this is
>> something we should really worry about. There are other reasons (e.g.,
>> speculative references) why migration could temporarily fail, does it happen
>> that often that it is really something we have to worry about?
> 
> The test fails consistently on arm64. It's my rough understanding that it's
> failing due to migration backing off because the fault handler has raised the
> ref count? (Dev correct me if I'm wrong).
> 
> So the real question is, is it a valid test in the first place? Should we just
> delete the test or do we need to strengthen the kernel's guarrantees around
> migration success?

I think the test should retry migration a number of times in case it 
fails. But if it is a persistent migration failure, the test should fail.

-- 
Cheers,

David / dhildenb




More information about the linux-arm-kernel mailing list