Accessing the preload engine from userspace

Henry Gomersall henry.gomersall at smartacoustics.co.uk
Wed Feb 24 06:19:49 PST 2016


I'm working on some performance critical numerical code running on a 667
MHz Cortex A9 (in a Xilinx Zynq) that is currently cache limited.
I can achieve ~1 neon multiply per memory access on small datasets (~530
MFlops), dropping to far less when the data size gets bigger (~180MFlops
in the limit).

The 180 MFlops figure above is with PLD instructions (which definitely
help), but I'm still nowhere near the theoretical memory bandwidth of
the main memory (DDR2 @533MHz). Given the problem is essentially
operating on sequential memory accesses that are fully defined a priori,
I was wondering if I could speed things up with the L2 preload engine.

Now, I've experimented somewhat with this, writing a small kernel driver
that flips the PLEUAR register bit to enable userspace access to the PLE
(as well as fiddling a bit with the PLEPCR register to reduce the cycles
between PLE operations), and then performing the PLE operation
(something like
"MCRR p15, 0, %[vect_addr], %[vect_config], c11;").

Various things are apparent from this:
1) The program runs successfully about 20% of the time, the remaining
times it yields an "Illegal instruction" almost immediately on the first
attempt to set up the PLE.
2) The run times are not affected in the slightest. If this is expected,
I'm very interested to know whether it's possible to ever operate on
data accessed at the memory bandwidth.
3) I can monitor the (16 length) FIFO and observe it decreasing in space
as the instructions are added and then reverting to empty.

My observations would suggest I'm not doing what I think I'm doing, or
at least, something is thwarting my attempts to do what I want to do.

Am I missing something fundamental here? Am I treading all over the
kernel's carefully protected space?

Are the virtual addresses different as viewed by the PLE compared to the
running process?

Why might the illegal instruction occur?

Any pointers or useful insights would be thankfully received.

Cheers,

Henry

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm/attachments/20160224/784fe4ed/attachment.sig>


More information about the linux-arm mailing list