[PATCH RFC 0/3] API for 128-bit IO access

Yury Norov ynorov at caviumnetworks.com
Wed Jan 24 01:05:16 PST 2018


Hi all,

This series adds API for 128-bit memory IO access and enables it for ARM64.
The original motivation for 128-bit API came from new Cavium network device
driver. The hardware requires 128-bit access to make things work. See
description in patch 3 for details.

Also, starting from ARMv8.4, stp and ldp instructions become atomic, and
API for 128-bit access would be helpful in core arm64 code.

This series is RFC. I'd like to collect opinions on idea and implementation
details.
* I didn't implement all 128-bit operations existing for 64-bit variables
and other types (__swab128p etc). Do we need them all right now, or we
can add them when actually needed?
* u128 name is already used in crypto code. So here I use __uint128_t that
comes from GCC for 128-bit types. Should I rename existing type in crypto
and make core code for 128-bit variables consistent with u64, u32 etc? (I
think yes, but would like to ask crypto people for it.)
* Some compilers don't support __uint128_t, so I protected all generic code
with config option HAVE_128BIT_ACCESS. I think it's OK, but... 
* For 128-bit read/write functions I take suffix 'o', which means read/write
the octet of bytes. Is this name OK?
* my mips-linux-gnu-gcc v6.3.0 doesn't support __uint128_t, and I
don't have other BE setup on hand, so BE case is formally not tested.
BE code for arm64 is looking well though.

With all that, this example code:

static int __init 128bit_test(void)
{
	__uint128_t v;
	__uint128_t addr;
	__uint128_t val = (__uint128_t) 0x1234567890abc;

	val |= ((__uint128_t) 0xdeadbeaf) << 64;

	writeo(val, &addr);
	v = reado(&addr);

	pr_err("%llx%llx\n", (u64) (val >> 64), (u64) val);
	pr_err("%llx%llx\n", (u64) (v >> 64), (u64) v);
	return v != val;
}

Generates this listing for arm64-le:

0000000000000000 <128bit_test>:
   0:	a9bb7bfd 	stp	x29, x30, [sp, #-80]!
   4:	910003fd 	mov	x29, sp
   8:	a90153f3 	stp	x19, x20, [sp, #16]
   c:	a9025bf5 	stp	x21, x22, [sp, #32]
  10:	f9001bf7 	str	x23, [sp, #48]
  14:	d5033e9f 	dsb	st
  18:	d2815797 	mov	x23, #0xabc                 	// #2748
  1c:	d297d5f6 	mov	x22, #0xbeaf                	// #48815
  20:	f2acf137 	movk	x23, #0x6789, lsl #16
  24:	f2bbd5b6 	movk	x22, #0xdead, lsl #16
  28:	f2c468b7 	movk	x23, #0x2345, lsl #32
  2c:	f2e00037 	movk	x23, #0x1, lsl #48
  30:	a9045bb7 	stp	x23, x22, [x29, #64]
  34:	a94453b3 	ldp	x19, x20, [x29, #64]
  38:	d5033d9f 	dsb	ld
  3c:	90000015 	adrp	x21, 0 <128bit_test>
  40:	910002b5 	add	x21, x21, #0x0
  44:	aa1703e2 	mov	x2, x23
  48:	aa1603e1 	mov	x1, x22
  4c:	aa1503e0 	mov	x0, x21
  50:	94000000 	bl	0 <printk>
  54:	aa1303e2 	mov	x2, x19
  58:	aa1403e1 	mov	x1, x20
  5c:	ca170273 	eor	x19, x19, x23
  60:	ca160294 	eor	x20, x20, x22
  64:	aa1503e0 	mov	x0, x21
  68:	aa140273 	orr	x19, x19, x20
  6c:	94000000 	bl	0 <printk>
  70:	f9401bf7 	ldr	x23, [sp, #48]
  74:	f100027f 	cmp	x19, #0x0
  78:	a94153f3 	ldp	x19, x20, [sp, #16]
  7c:	1a9f07e0 	cset	w0, ne  // ne = any
  80:	a9425bf5 	ldp	x21, x22, [sp, #32]
  84:	a8c57bfd 	ldp	x29, x30, [sp], #80
  88:	d65f03c0 	ret

And for arm64-be:

0000000000000000 <128bit_test>:
   0:	a9bb7bfd 	stp	x29, x30, [sp, #-80]!
   4:	910003fd 	mov	x29, sp
   8:	a90153f3 	stp	x19, x20, [sp, #16]
   c:	a9025bf5 	stp	x21, x22, [sp, #32]
  10:	f9001bf7 	str	x23, [sp, #48]
  14:	d5033e9f 	dsb	st
  18:	d2802001 	mov	x1, #0x100                 	// #256
  1c:	d2d5bbc0 	mov	x0, #0xadde00000000        	// #191168994344960
  20:	f2a8a461 	movk	x1, #0x4523, lsl #16
  24:	f2f5f7c0 	movk	x0, #0xafbe, lsl #48
  28:	f2d12ce1 	movk	x1, #0x8967, lsl #32
  2c:	f2f78141 	movk	x1, #0xbc0a, lsl #48
  30:	a90407a0 	stp	x0, x1, [x29, #64]
  34:	a94453b3 	ldp	x19, x20, [x29, #64]
  38:	dac00e73 	rev	x19, x19
  3c:	dac00e94 	rev	x20, x20
  40:	d5033d9f 	dsb	ld
  44:	d2815796 	mov	x22, #0xabc                 	// #2748
  48:	90000015 	adrp	x21, 0 <128bit_test>
  4c:	f2acf136 	movk	x22, #0x6789, lsl #16
  50:	910002b5 	add	x21, x21, #0x0
  54:	f2c468b6 	movk	x22, #0x2345, lsl #32
  58:	d297d5f7 	mov	x23, #0xbeaf                	// #48815
  5c:	f2e00036 	movk	x22, #0x1, lsl #48
  60:	f2bbd5b7 	movk	x23, #0xdead, lsl #16
  64:	aa1603e2 	mov	x2, x22
  68:	aa1703e1 	mov	x1, x23
  6c:	aa1503e0 	mov	x0, x21
  70:	94000000 	bl	0 <printk>
  74:	aa1403e2 	mov	x2, x20
  78:	aa1303e1 	mov	x1, x19
  7c:	ca160294 	eor	x20, x20, x22
  80:	ca170273 	eor	x19, x19, x23
  84:	aa1503e0 	mov	x0, x21
  88:	aa140273 	orr	x19, x19, x20
  8c:	94000000 	bl	0 <printk>
  90:	f9401bf7 	ldr	x23, [sp, #48]
  94:	f100027f 	cmp	x19, #0x0
  98:	a94153f3 	ldp	x19, x20, [sp, #16]
  9c:	1a9f07e0 	cset	w0, ne  // ne = any
  a0:	a9425bf5 	ldp	x21, x22, [sp, #32]
  a4:	a8c57bfd 	ldp	x29, x30, [sp], #80
  a8:	d65f03c0 	ret

I tested LE kernel with this, and it works OK for me. BE version adds
few extra instructions to swap bytes, but generated code looks reasonable. 
We can avoid byteswapping, if not needed, by using __raw_reado() and 
__raw_writeo().

Yury Norov (3):
  UAPI: Introduce 128-bit types and byteswap operations
  asm-generic/io.h: API for 128-bit I/O accessors
  arm64: enable 128-bit memory read/write support

 arch/Kconfig                                 |   7 ++
 arch/arm64/include/asm/io.h                  |  31 ++++++
 include/asm-generic/io.h                     | 147 +++++++++++++++++++++++++++
 include/linux/byteorder/generic.h            |   4 +
 include/uapi/asm-generic/int-ll64.h          |   8 ++
 include/uapi/linux/byteorder/big_endian.h    |   2 +
 include/uapi/linux/byteorder/little_endian.h |   4 +
 include/uapi/linux/swab.h                    |  22 ++++
 include/uapi/linux/types.h                   |   4 +
 9 files changed, 229 insertions(+)

-- 
2.11.0




More information about the linux-arm-kernel mailing list