[PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse

Thu Aug 24 04:25:41 PDT 2023

On 24.08.23 13:06, David Hildenbrand wrote:
> On 24.08.23 12:44, Catalin Marinas wrote:
>> On Thu, Aug 24, 2023 at 09:50:32AM +0200, David Hildenbrand wrote:
>>> after re-reading it 2 times, I still have no clue what your patch set is
>>> actually trying to achieve. Probably there is a way to describe how user
>>> space intents to interact with this feature, so to see which value this
>>> actually has for user space -- and if we are using the right APIs and
>>> allocators.
>>
>> I'll try with an alternative summary, hopefully it becomes clearer (I
>> think Alex is away until the end of the week, may not reply
>> immediately). If this still doesn't work, maybe we should try a
>> different implementation ;).
>>
>> The way MTE is implemented currently is to have a static carve-out of
>> the DRAM to store the allocation tags (a.k.a. memory colour). This is
>> what we call the tag storage. Each 16 bytes have 4 bits of tags, so this
>> means 1/32 of the DRAM, roughly 3% used for the tag storage. This is
>> done transparently by the hardware/interconnect (with firmware setup)
>> and normally hidden from the OS. So a checked memory access to location
>> X generates a tag fetch from location Y in the carve-out and this tag is
>> compared with the bits 59:56 in the pointer. The correspondence from X
>> to Y is linear (subject to a minimum block size to deal with some
>> address interleaving). The software doesn't need to know about this
>> correspondence as we have specific instructions like STG/LDG to location
>> X that lead to a tag store/load to Y.
>>
>> Now, not all memory used by applications is tagged (mmap(PROT_MTE)).
>> For example, some large allocations may not use PROT_MTE at all or only
>> for the first and last page since initialising the tags takes time. The
>> side-effect is that of these 3% DRAM, only part, say 1% is effectively
>> used. Some people want the unused tag storage to be released for normal
>> data usage (i.e. give it to the kernel page allocator).
>>
>> So the first complication is that a PROT_MTE page allocation at address
>> X will need to reserve the tag storage at location Y (and migrate any
>> data in that page if it is in use).
>>
>> To make things worse, pages in the tag storage/carve-out range cannot
>> use PROT_MTE themselves on current hardware, so this adds the second
>> complication - a heterogeneous memory layout. The kernel needs to know
>> where to allocate a PROT_MTE page from or migrate a current page if it
>> becomes PROT_MTE (mprotect()) and the range it is in does not support
>> tagging.
>>
>> Some other complications are arm64-specific like cache coherency between
>> tags and data accesses. There is a draft architecture spec which will be
>> released soon, detailing how the hardware behaves.
>>
>> To your question about user APIs/ABIs, that's entirely transparent. As
>> with the current kernel (without this dynamic tag storage), a user only
>> needs to ask for PROT_MTE mappings to get tagged pages.
> 
> Thanks, that clarifies things a lot.
> 
> So it sounds like you might want to provide that tag memory using CMA.
> 
> That way, only movable allocations can end up on that CMA memory area,
> and you can allocate selected tag pages on demand (similar to the
> alloc_contig_range() use case).
> 
> That also solves the issue that such tag memory must not be longterm-pinned.
> 
> Regarding one complication: "The kernel needs to know where to allocate
> a PROT_MTE page from or migrate a current page if it becomes PROT_MTE
> (mprotect()) and the range it is in does not support tagging.",
> simplified handling would be if it's in a MIGRATE_CMA pageblock, it
> doesn't support tagging. You have to migrate to a !CMA page (for
> example, not specifying GFP_MOVABLE as a quick way to achieve that).
> 

Okay, I now realize that this patch set effectively duplicates some CMA 
behavior using a new migrate-type. Yeah, that's probably not what we 
want just to identify if memory is taggable or not.

Maybe there is a way to just keep reusing most of CMA instead.

Another simpler idea to get started would be to just intercept the first 
PROT_MTE, and allocate all CMA memory. In that case, systems that don't 
ever use PROT_MTE can have that additional 3% of memory.

You probably know better how frequent it is that only a handful of 
applications use PROT_MTE, such that there is still a significant 
portion of tag memory to be reused (and if it's really worth optimizing 
for that scenario).

-- 
Cheers,

David / dhildenb