SPI: performance regression when using the common message queuing infrastructure

Wed Jul 6 02:50:28 PDT 2016

Hi Mark,

recently Heiko reported to us a performance regression with Atmel SPI
controllers. He noticed the issue on a sam9g15ek board and I was also able to
reproduce it on a sama5d36ek board.

We found out that the performance regression was introduced in 3.14 by commit:
8090d6d1a415d3ae1a7208995decfab8f60f4f36
spi: atmel: Refactor spi-atmel to use SPI framework queue

For the test, I connected a Spansion S25FL512 memory on the SPI1 controller of
a sama5d36ek board. Then with an oscilloscope I monitored the chip-select, clock
and MOSI signals on the SPI bus.

1 - Reading 512 bytes from the memory

# dd if=/dev/mtd6 bs=512 count=1 of=/dev/null

With the oscilloscope, I measured the time between the chip-select fell before
the Read Status command (05h) and the chip-select rose after all data had been
read by the 4-byte address Fast Read 1-1-1 command (13h).

3.14 vanilla                      : 305 µs
3.14 commit 8090d6d1a415 reverted : 242 µs   -21%

2 - Reading 1000 x 1024 bytes from the memory

# dd if=/dev/mtd6 bs=1024 count=1000 of=/dev/null

Still with the scope, I measured the time to read all data.

3.14 vanilla                      : 435 ms
3.14 commit 8090d6d1a415 reverted : 361 ms   -17%

Indeed the oscilloscope shows that more time is spent between messages and
transfers.

commit 8090d6d1a415 replaced the tasklet used to manage a SPI message/transfer
queue by a workqueue provided by the SPI framework.

The support of this (optional) workqueue was introduced by commit:
ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0
spi: create a message queuing infrastructure

Though the commit message claims that is common infrastructure is optional,
the patch also claims the .transfer() hook is deprecated, suggesting drivers
should implement the new .transfer_one_message() hook instead.

This is the reason why commit 8090d6d1a415 was submitted. However we lost
quite amount of performances moving from our tasklet to the generic workqueue.

So do you recommend us to keep our current generic implementation relying on
the SPI framework workqueue or to go back to a custom implementation using
tasklet?
If we keep the current implementation, is there a way to improve the
performances so we go back to something close to what he had before?

We saw in commit ffbbdd21329f that we can change the workqueue thread
scheduling policy to SCHED_FIFO by setting master->rt.

Best regards,

Cyrille