libertas: GSPI device patch performance

Mon Mar 30 22:05:25 EDT 2009

On Mon, Mar 30, 2009 at 9:21 AM, Dan Williams <dcbw at redhat.com> wrote:
> On Sun, 2009-03-29 at 13:47 +0800, Mok Keith wrote:
>> Hi all,
>>
>> For the patch for GSPI devices, do we really need to create another
>> kernel thread for sending spi command. Since doing so, we need to
>> memcpy all the data from host to card, and it reduces performance. Can
>> we just send the command directly in host_to_card function, I believe
>> the comment that if_spi_host_to_card can't sleep is wrong. Since
>> hw_host_to_card is called by a kernel thread (lbs_thread) in main.c.
>
> It depends on how if_spi_host_to_card() is really implemented by the
> platform SPI driver whether or not it needs to sleep; the mainloop is
> holding a spinlock with interrupts disabled.

Generally speaking, we have a GPIO-based IRQ that informs the host
that one or more of the following have occured:
- the card has data for the host
- the card has an event for the host
- the card is ready for a command from the host (also, the last
command, if any, is handled)
- the card is ready for data from the host

We need to know that the card is ready for a command or for data
before we can write our buffer out, we therefore queue up outgoing
command or data buffers.  On the RX side, we just want to handle the
bottom half of the IRQ.  Right now, this is implemented by the one
driver thread that does work based on what the card and libertas core
are ready to do.

> The SPI controller may well have alignment restrictions that require
> copying and realigning the data on certain boundaries; there's no
> guarantee that priv->tx_pending_buf will be aligned on those boundaries.
> It'll be aligned to _something_ since it's not a member of a packed
> struct, but that's dependent on the compiler flags at build-time rather
> than the actual hardware requirements.

I wonder if we can resolve this now, following up on the previous
discussion about alignment.  Aligning the tx_pending_buf and avoiding
a memcpy would be nice, although it unfortunately depends on what the
host SPI controller needs.

> All that said, I don't see anything offhand that would prevent
> tx_pending_buf being used directly by those interface drivers that need
> it.  The device's queue should be blocked until the card has finished
> processing the packet in lbs_send_tx_feedback().  Need to verify that in
> all the corner cases though (like error conditions) that the main stack
> won't overwrite priv->tx_pending_buf before the card is done with it.
>
> Second, are you sure the memcpy is the bottleneck?  ie, is that memcpy
> the bottleneck, or are there other bottlenecks in the driver or your SPI
> controller code that are causing problems?

We have some evidence that the driver thread in question is a
bottle-neck, at least on Blackfin.  I'm investigating further but the
memcpy is a definite candidate.  Keith -- can you elaborate on your
findings?  What is your test setup?  Are you using an ARM or a
Blackfin CPU?

Thanks,

  -Andrey

>> For interrupt event, we can just do a scheduled work to read the event cause.
>> Keith Mok
>>
>> _______________________________________________
>> libertas-dev mailing list
>> libertas-dev at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/libertas-dev
>
>
> _______________________________________________
> libertas-dev mailing list
> libertas-dev at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/libertas-dev
>