[PATCH] ARM: io: avoid writeback addressing modes for __raw_ accessors
Nicolas Pitre
nico at fluxnic.net
Thu Aug 16 23:43:01 EDT 2012
On Tue, 14 Aug 2012, Will Deacon wrote:
> Data aborts taken to hyp mode do not provide a valid instruction
> syndrome field in the HSR if the faulting instruction is a memory
> access using a writeback addressing mode.
>
> For hypervisors emulating MMIO accesses to virtual peripherals, taking
> such an exception requires disassembling the faulting instruction in
> order to determine the behaviour of the access. Since this requires
> manually walking the two stages of translation, the world must be
> stopped to prevent races against page aging in the guest, where the
> first-stage translation is invalidated after the hypervisor has
> translated to an IPA and the physical page is reused for something else.
>
> This patch avoids taking this heavy performance penalty when running
> Linux as a guest by ensuring that our I/O accessors do not make use of
> writeback addressing modes.
How often does this happen? I don't really see writeback as a common
pattern for IO access.
What does happen quite a lot, though, is pre-indexed addressing. For
example, let's take this code which is fairly typical of driver code:
#define HW_REG1 0x10
#define HW_REG2 0x14
#define HW_REG3 0x18
#define HW_REG4 0x30
int hw_init(void __iomem *ioaddr)
{
writel(0, ioaddr + HW_REG1)
writel(-1, ioaddr + HW_REG2);
writel(readl(ioaddr + HW_REG3) | 0xff, ioaddr + HW_REG4);
return 0;
}
Right now this produces this:
hw_init:
mov r3, r0
mvn r2, #0
mov r0, #0
str r0, [r3, #16]
str r2, [r3, #20]
ldr r2, [r3, #24]
orr r2, r2, #255
str r2, [r3, #48]
bx lr
With your patch applied this becomes:
hw_init:
add r2, r0, #16
mov r3, #0
str r3, [r2]
mvn r3, #0
add r2, r0, #20
str r3, [r2]
add r3, r0, #24
ldr r3, [r3]
orr r3, r3, #255
add r0, r0, #48
str r3, [r0]
mov r0, #0
bx lr
This basically made every IO access into two instructions instead of
only one, as well as increasing register pressure.
So, is the performance claim something that you've actually measured
with a real system, or was it only theoretical?
Nicolas
More information about the linux-arm-kernel
mailing list