[PATCH 1/2] RISC-V: Probe for unaligned access speed

David Laight David.Laight at ACULAB.COM
Sun Jun 25 14:42:07 PDT 2023


From: Evan Green
> Sent: 23 June 2023 23:20
> 
> Rather than deferring misaligned access speed determinations to a vendor
> function, let's probe them and find out how fast they are. If we
> determine that a misaligned word access is faster than N byte accesses,
> mark the hardware's misaligned access as "fast".
> 
> Fix the documentation as well to reflect this bar. Previously the only
> SoC that returned "fast" was the THead C906. The change to the
> documentation is more a clarification, since the C906 is fast in the
> sense of the corrected documentation.
> 
> Signed-off-by: Evan Green <evan at rivosinc.com>
> ---
...
> diff --git a/Documentation/riscv/hwprobe.rst b/Documentation/riscv/hwprobe.rst
> index 19165ebd82ba..710325751766 100644
> --- a/Documentation/riscv/hwprobe.rst
> +++ b/Documentation/riscv/hwprobe.rst
> @@ -88,12 +88,12 @@ The following keys are defined:
>      always extremely slow.
> 
>    * :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned accesses are supported
> -    in hardware, but are slower than the cooresponding aligned accesses
> -    sequences.
> +    in hardware, but are slower than N byte accesses, where N is the native
> +    word size.
> 
>    * :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned accesses are supported
> -    in hardware and are faster than the cooresponding aligned accesses
> -    sequences.
> +    in hardware and are faster than N byte accesses, where N is the native
> +    word size.

I think I'd just say 'faster/slower than using byte accesses' (ie no N).

There are two obvious FAST cases:
1) the misaligned access takes an extra clock - worth aligning copies.
2) the misaligned access is pretty much as fast as an aligned one.

Even if you find it hard to distinguish them you should probably
allow for them both.

x86 (on Intel (non-atom) cpu) is definitely in the latter camp.
Misaligned copies are measurable slower - but not enough to
ever worry about.
I think that misaligned transfers get spilt into 8 byte accesses
(pretty irrelevant in the kernel) and then accesses that cross
cache line boundaries are split on the boundary.
With pipelined writes and two reads/clock it doesn't often
make a measurable difference.
That is definitely what I see for uncached accesses to PCIe space.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)




More information about the linux-riscv mailing list