[PATCH v4] riscv: Fixed misaligned memory access. Fixed pointer comparison.
David Laight
David.Laight at ACULAB.COM
Sat Jan 29 05:41:52 PST 2022
From: michael at michaelkloos.com
...
> [v4]
>
> I could not resist implementing the optimization I mentioned in
> my v3 notes. I have implemented the roll over of data by cpu
> register in the misaligned fixup copy loops. Now, only one load
> from memory is required per iteration of the loop.
I nearly commented...
...
> + /*
> + * Fix Misalignment Copy Loop.
> + * load_val1 = load_ptr[0];
> + * while (store_ptr != store_ptr_end) {
> + * load_val0 = load_val1;
> + * load_val1 = load_ptr[1];
> + * *store_ptr = (load_val0 >> {a6}) | (load_val1 << {a7});
> + * load_ptr++;
> + * store_ptr++;
> + * }
> + */
> + REG_L t0, 0x000(a3)
> + 1:
> + beq t3, t6, 2f
> + mv t1, t0
> + REG_L t0, SZREG(a3)
> + srl t1, t1, a6
> + sll t2, t0, a7
> + or t1, t1, t2
> + REG_S t1, 0x000(t3)
> + addi a3, a3, SZREG
> + addi t3, t3, SZREG
> + j 1b
No point jumping to a conditional branch that jumps bak
Make this a:
bne t3, t6, 1b
and move 1: down one instruction.
(Or is the 'beq' at the top even possible - there is likely to
be an earlier test for zero length copies.)
> + 2:
I also suspect it is worth unrolling the loop once.
You lose the 'mv t1, t0' and one 'addi' for each word transferred.
I think someone mentioned that there is a few clocks delay before
the data from the memory read (REG_L) is actually available.
On in-order cpu this is likely to be a full pipeline stall.
So move the 'addi' up between the 'REG_L' and 'sll' instructions.
(The offset will need to be -SZREG to match.)
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
More information about the linux-riscv
mailing list