[RESEND PATCH v5] arm64: Enable vmalloc-huge with ptdump
Dev Jain
dev.jain at arm.com
Wed Jul 30 21:30:15 PDT 2025
On 30/07/25 11:59 pm, Ryan Roberts wrote:
> On 30/07/2025 18:00, Catalin Marinas wrote:
>> On Wed, Jul 23, 2025 at 09:48:27PM +0530, Dev Jain wrote:
> [...]
>
>>> + * mmap_write_lock/unlock in T1 be called CS (the critical section).
>>> + *
>>> + * Claim: The CS of T1 will never operate on a freed PMD table.
>>> + *
>>> + * Proof:
>>> + *
>>> + * Case 1: The static branch is visible to T2.
>>> + *
>>> + * Case 1 (a): T1 acquires the lock before T2 can.
>>> + * T2 will block until T1 drops the lock, so pmd_free() will only be
>>> + * executed after T1 exits CS.
>> This assumes that there is some ordering between unlock and pmd_free()
>> (e.g. some poisoning of the old page). The unlock only gives us release
>> semantics, not acquire. It just happens that we have an atomic
>> dec-and-test down the __free_pages() path but I'm not convinced we
>> should rely on it unless free_pages() has clear semantics on ordering
>> related to prior memory writes.
> I can understand how pmd_free() could be re-ordered before the unlock, but
> surely it can't be reorded before the lock? I need to go unlearn everything I
> thought I understood about locking if that's the case...
>
You are correct, what Catalin is saying is that my reasoning has a hole.
There is no obvious ordering between unlock and free(), but
mmap_write_unlock() will happen before mmap_read_lock() ... (i)
mmap_read_lock() will happen before pmd_free() ... (ii)
which lets us conclude that mmap_write_unlock() will happen before pmd_free().
A more rigorous way to write this would be (for Case 1):
T2 T1
pmd_clear() 5. cmpxchg(enable static branch)
1. while (!cmpxchg(check if lock not taken in write mode)); 6. smp_mb()
2. smp_mb(); 7. while (!cmpxchg(check if lock not taken))
3. smp_mb(); 8. smp_mb()
4. cmpxchg(release lock) CS instructions
pmd_free() 9. smp_mb()
10. cmpxchg (release lock)
where: (1,2) = mmap_read_lock(), (3,4) = mmap_read_unlock(), (5,6) = static_branch_enable(),
(7,8) = mmap_write_lock(), (9, 10) = mmap_write_unlock().
For Case 1(a), 7 succeeds before 1 can. So, 1 will block until 10 completes, so 10 is
observed before 1, and 1 is observed before pmd_free() due to the barriers. The associativity follows.
More information about the linux-arm-kernel
mailing list