[PATCH v2 04/42] arm64/sve: Make access to FFR optional

Mark Rutland mark.rutland at arm.com
Tue Oct 19 07:39:11 PDT 2021


On Tue, Oct 19, 2021 at 11:14:47AM +0100, Will Deacon wrote:
> On Mon, Oct 18, 2021 at 08:08:20PM +0100, Mark Brown wrote:
> >  SYM_FUNC_START(sve_flush_live)
> > -	cbz		x0, 1f	// A VQ-1 of 0 is 128 bits so no extra Z state
> > +	cbz		x1, 1f	// A VQ-1 of 0 is 128 bits so no extra Z state
> >  	sve_flush_z
> > -1:	sve_flush_p_ffr
> > +1:	cbz		x0, 2f
> > +	sve_flush_p
> > +2:	sve_flush_ffr
> >  	ret

> > @@ -962,7 +962,7 @@ void do_sve_acc(unsigned int esr, struct pt_regs *regs)
> >  		unsigned long vq_minus_one =
> >  			sve_vq_from_vl(current->thread.sve_vl) - 1;
> >  		sve_set_vq(vq_minus_one);
> > -		sve_flush_live(vq_minus_one);
> > +		sve_flush_live(true, vq_minus_one);
> 
> What does the pcs say about passing bools in registers? Can we guarantee
> that false is a 64-bit zero?

Per usual rules, bits [63:8] can be arbitrary -- AAPCS64 leaves it to the callee
to extend values, with the upper bits being arbitrary, and it maps _Bool/bool
to unsigned char, which covers bits [7:0].

So a bool false in a register is not guaranteed to be a 64-bit zero. But
since it *is* guarnateed to be either 0 or 1, we can use TBZ/TBNZ
instead of CBZ/CBNZ. Either that, or extend it to a wider type in the
function prototype.

The test below shows clang and GCC both agree with that (though this old
GCC seems to do unnecessary zero extension as a caller):

| [mark at gravadlaks:~]% cat bool.c       
| #include <stdbool.h>
| 
| void callee_bool(bool b);
| 
| void callee_unsigned_int(unsigned int i);
| 
| void caller_unsigned_long(unsigned long l)
| {
|         unsigned long tmp = l & 0xffffffff;
| 
|         if (tmp)
|                 callee_unsigned_int(tmp);
|         else
|                 callee_bool(tmp);
| }
| 
| unsigned long bool_to_unsigned_long(bool b)
| {
|         return b;
| }
| [mark at gravadlaks:~]% gcc --version    
| gcc (Debian 8.3.0-6) 8.3.0
| Copyright (C) 2018 Free Software Foundation, Inc.
| This is free software; see the source for copying conditions.  There is NO
| warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
| 
| [mark at gravadlaks:~]% gcc -c bool.c -O3
| [mark at gravadlaks:~]% objdump -d bool.o
| 
| bool.o:     file format elf64-littleaarch64
| 
| 
| Disassembly of section .text:
| 
| 0000000000000000 <caller_unsigned_long>:
|    0:   34000040        cbz     w0, 8 <caller_unsigned_long+0x8>
|    4:   14000000        b       0 <callee_unsigned_int>
|    8:   52800000        mov     w0, #0x0                        // #0
|    c:   14000000        b       0 <callee_bool>
| 
| 0000000000000010 <bool_to_unsigned_long>:
|   10:   92401c00        and     x0, x0, #0xff
|   14:   d65f03c0        ret
| [mark at gravadlaks:~]% clang --version    
| clang version 7.0.1-8+deb10u2 (tags/RELEASE_701/final)
| Target: aarch64-unknown-linux-gnu
| Thread model: posix
| InstalledDir: /usr/bin
| [mark at gravadlaks:~]% clang -c bool.c -O3
| [mark at gravadlaks:~]% objdump -d bool.o  
| 
| bool.o:     file format elf64-littleaarch64
| 
| 
| Disassembly of section .text:
| 
| 0000000000000000 <caller_unsigned_long>:
|    0:   34000040        cbz     w0, 8 <caller_unsigned_long+0x8>
|    4:   14000000        b       0 <callee_unsigned_int>
|    8:   14000000        b       0 <callee_bool>
| 
| 000000000000000c <bool_to_unsigned_long>:
|    c:   92400000        and     x0, x0, #0x1
|   10:   d65f03c0        ret

Thanks,
Mark.



More information about the linux-arm-kernel mailing list