[PATCH v3 3/5] arm64: mte: implement CONFIG_ARM64_MTE_COMP

Yury Norov yury.norov at gmail.com
Wed Jul 19 14:06:24 PDT 2023


On Wed, Jul 19, 2023 at 04:00:14PM +0200, Alexander Potapenko wrote:
> > > + * 4. For the inline case, the following values are stored in the 8-byte handle:
> > > + *       largest_idx : i4
> > > + *      r_tags[0..5] : i4 x 6
> > > + *     r_sizes[0..4] : i7 x 5
> > > + *    (if N is less than 6, @r_tags and @r_sizes are padded up with zero values)
> > > + *
> > > + *    Because @largest_idx is <= 5, bit 63 of the handle is always 0 (so it can
> > > + *    be stored in the Xarray), and bits 62..60 cannot all be 1, so it can be
> > > + *    distinguished from a kernel pointer.
> >
> > I honestly tried to understand... For example, what the
> >         'r_sizes[0..4] : i7 x 5'
> > means? An array of 5 elements, 17 bits each? But it's alone greater
> > than size of pointer... Oh gosh...
> 
> iN (note that it is a small i, not a 1) is LLVM notation for an N-bit
> integer type.
> There's no such thing in C syntax, and describing the data layout
> using bitfields would be painful.
> Would it help if I add a comment explaining that iN is an N-bit
> integer? Or do you prefer something like
> 
>   r_sizes[0..4] : 5 x 7 bits
> 
> ?

Yes, that would help.
 
> >
> > Moreover, MTE tags are all 4-bits.
> >
> > It seems like text without pictures is above my mental abilities. Can you
> > please illustrate it? For example, from the #4, I (hopefully correctly)
> > realized that:
> >
> > Inline frame format:
> >    0                                                    60       62  63
> >  +---------------------------------------------------------------------+
> I think it's more natural to number bits from 63 to 0
> 
> >  |              No idea what happens here             |    Lidx    | X |
> >  +---------------------------------------------------------------------+
> >  63    : X      :  RAZ : Reserved for Xarray.
> >  60-62 : Lidx   : 0..5 : Largest element index.
> 
> There's some mismatch now between this picture, where Lidx is i3, and
> the implementation that treats it as an i4, merging bit 63 into it.
> Maybe we can avoid this by not splitting off bit 63?
> 
> >                      6 : Reserved
> >                      7 : Invalid handler
> 
> No, 7 means "treat this handle as an out-of-line one". It is still
> valid, but instead of tag data it contains a pointer.

So, it's invalid combination for _inline_ handler, right? Anyways, I'm
waiting for an updated docs, so it will hopefully bring some light.

> > > +
> > > +/* Transform tag ranges back into tags. */
> > > +void ea0_ranges_to_tags(u8 *r_tags, short *r_sizes, int r_len, u8 *tags)
> > > +{
> > > +     int i, j, pos = 0;
> > > +     u8 prev;
> > > +
> > > +     for (i = 0; i < r_len; i++) {
> > > +             for (j = 0; j < r_sizes[i]; j++) {
> > > +                     if (pos % 2)
> > > +                             tags[pos / 2] = (prev << 4) | r_tags[i];
> > > +                     else
> > > +                             prev = r_tags[i];
> > > +                     pos++;

This code flushes tags at every 2nd iteration. Is that true that
there's always an even number of iterations, i.e. rsizes is always
even, assuming r_len can be 1?

If not, it's possible to loose a tag, consider rlen == 1 and
rsizes[0] == 1.

If yes, you can simplify:

     for (i = 0; i < r_len; i++)
             for (j = 0; j < r_sizes[i]; j++)
                     tags[pos++] = (r_tags[i] << 4) | r_tags[i];

Anyways, in the test can you run all possible combinations?

> > > +             }
> > > +     }
> > > +}
> > > +EXPORT_SYMBOL_NS(ea0_ranges_to_tags, MTECOMP);
> >
> > Because I didn't understand the compressed frame format, not sure I
> > can understand this logic...
> Hope the above comments will help, if not - please let me know.
> 
> > > +
> > > +/* Translate @num_ranges into the allocation size needed to hold them. */
> > > +static int ea0_alloc_size(int num_ranges)
> > > +{
> > > +     if (num_ranges <= 6)
> > > +             return 8;
> > > +     if (num_ranges <= 11)
> > > +             return 16;
> > > +     if (num_ranges <= 23)
> > > +             return 32;
> > > +     if (num_ranges <= 46)
> > > +             return 64;
> > > +     return 128;
> > > +}
> > > +
> > > +/* Translate allocation size into maximum number of ranges that it can hold. */
> > > +static int ea0_size_to_ranges(int size)
> > > +{
> > > +     switch (size) {
> > > +     case 8:
> > > +             return 6;
> > > +     case 16:
> > > +             return 11;
> > > +     case 32:
> > > +             return 23;
> > > +     case 64:
> > > +             return 46;
> > > +     default:
> > > +             return 0;
> > > +     }
> > > +}
> >
> > I wonder if there's a math formula here? Can you explain where from
> > those numbers come?
> 
> I'll add a comment there.
> Basically, for the inline case it is the biggest N such as 4 + 4*N +
> 7*(N-1) <= 63
>
> (not 64, because Xarray steals one bit from us)
> 
> For the out-of-line case it is the biggest N such as 6+4*N + 7*(N-1)
> <= array size in bits (128, 256, or 512).

So it's: N <= (BIT_CAPACITY + NUM4 - NUM7) / (TAGSZ + SZ)

Doesn't look like a rocket science...

Why don't you code the formula instead of results? Are there any
difficulties? Things may change. For example, in next spec they
may bring 3- or 5-bit tags, and your compressor will crack loudly
with hardcoded numbers.



More information about the linux-arm-kernel mailing list