arm64 MTE tag storage reuse - alternatives to MIGRATE_CMA

Tue Feb 20 08:16:26 PST 2024

>>>>> I believe this is a very good fit for tag storage reuse, because it allows
>>>>> tag storage to be allocated even in atomic contexts, which enables MTE in
>>>>> the kernel. As a bonus, all of the changes to MM from the current approach
>>>>> wouldn't be needed, as tag storage allocation can be handled entirely in
>>>>> set_ptes_at(), copy_*highpage() or arch_swap_restore().
>>>>>
>>>>> Is this a viable approach that would be upstreamable? Are there other
>>>>> solutions that I haven't considered? I'm very much open to any alternatives
>>>>> that would make tag storage reuse viable.
>>>>
>>>> As raised recently, I had similar ideas with something like virtio-mem in
>>>> the past (wanted to call it virtio-tmem back then), but didn't have time to
>>>> look into it yet.
>>>>
>>>> I considered both, using special device memory as "cleancache" backend, and
>>>> using it as backend storage for something similar to zswap. We would not
>>>> need a memmap/"struct page" for that special device memory, which reduces
>>>> memory overhead and makes "adding more memory" a more reliable operation.
>>>
>>> Hm... this might not work with tag storage memory, the kernel needs to
>>> perform cache maintenance on the memory when it transitions to and from
>>> storing tags and storing data, so the memory must be mapped by the kernel.
>>
>> The direct map will definitely be required I think (copy in/out data). But
>> memmap for tag memory will likely not be required. Of course, it depends how
>> to manage tag storage. Likely we have to store some metadata, hopefully we
>> can avoid the full memmap and just use something else.
> 
> So I guess instead of ZONE_DEVICE I should try to use arch_add_memory()
> directly? That has the limitation that it cannot be used by a driver
> (symbol not exported to modules).
You can certainly start with something simple, and we can work on 
removing that memmap allocation later.

Maybe we have to expose new primitives in the context of such drivers. 
arch_add_memory() likely also doesn't do what you need.

I recall that we had a way of only messing with the direct map.

Last time I worked with that was in the context of memtrace
(arch/powerpc/platforms/powernv/memtrace.c)

There, we call arch_create_linear_mapping()/arch_remove_linear_mapping().

... and now my memory comes back: we never finished factoring out 
arch_create_linear_mapping/arch_remove_linear_mapping so they would be 
available on all architectures.

Your driver will be very arm64 specific, so doing it in an arm64-special 
way might be good enough initially. For example, the arm64-core could 
detect that special memory region and just statically prepare the direct 
map and not expose the memory to the buddy/allocate a memmap. Similar to 
how we handle the crashkernel/kexec IIRC (we likely do not have a direct 
map for that, though; ).

[I was also wondering if we could simply dynamically map/unmap when 
required so you can just avoid creating the entire direct map; might bot 
be the best approach performance-wise, though]

There are a bunch of details to be sorted out, but I don't consider the 
directmap/memmap side of things a big problem.

-- 
Cheers,

David / dhildenb