Race condition observed between page migration and page fault handling on arm64 machines
David Hildenbrand
david at redhat.com
Thu Aug 1 06:15:31 PDT 2024
>> What I am still missing is why this is (a) arm64 only; and (b) if this is
>> something we should really worry about. There are other reasons (e.g.,
>> speculative references) why migration could temporarily fail, does it happen
>> that often that it is really something we have to worry about?
>
> The test fails consistently on arm64. It's my rough understanding that it's
> failing due to migration backing off because the fault handler has raised the
> ref count? (Dev correct me if I'm wrong).
>
> So the real question is, is it a valid test in the first place? Should we just
> delete the test or do we need to strengthen the kernel's guarrantees around
> migration success?
I think the test should retry migration a number of times in case it
fails. But if it is a persistent migration failure, the test should fail.
--
Cheers,
David / dhildenb
More information about the linux-arm-kernel
mailing list