libertas: GSPI device patch performance

Keith Mok ek9852 at gmail.com
Wed Apr 1 07:28:29 EDT 2009


Hi Andrey,

> I wonder if we can resolve this now, following up on the previous
> discussion about alignment.  Aligning the tx_pending_buf and avoiding
> a memcpy would be nice, although it unfortunately depends on what the
> host SPI controller needs.
>   
I think that the SPI driver should check for transfer buffer address 
passing to it.
If it is aligned for dma use, then use it, otherwise use PIO to transfer 
few bytes to make it aligned, then back to dma.
I think it is the role of SPI driver to make sure the address is aligned 
before enabling dma transfer, not any particular driver which call it.
(At least the pxa2xx_spi driver will check it.)

> We have some evidence that the driver thread in question is a
> bottle-neck, at least on Blackfin.  I'm investigating further but the
> memcpy is a definite candidate.  Keith -- can you elaborate on your
> findings?  What is your test setup?  Are you using an ARM or a
> Blackfin CPU?
>   
We are working on a custom prototype board using ARM, not available on 
public yet.
We have no solid figure to prove that the memcpy is the bottle-neck, 
sometimes it is difficult to measure/count.
Anyway we believe that memcpy is a candidate to be removed.

Thanks
Keith



Andrey Yurovsky wrote:
> On Mon, Mar 30, 2009 at 9:21 AM, Dan Williams <dcbw at redhat.com> wrote:
>   
>> On Sun, 2009-03-29 at 13:47 +0800, Mok Keith wrote:
>>     
>>> Hi all,
>>>
>>> For the patch for GSPI devices, do we really need to create another
>>> kernel thread for sending spi command. Since doing so, we need to
>>> memcpy all the data from host to card, and it reduces performance. Can
>>> we just send the command directly in host_to_card function, I believe
>>> the comment that if_spi_host_to_card can't sleep is wrong. Since
>>> hw_host_to_card is called by a kernel thread (lbs_thread) in main.c.
>>>       
>> It depends on how if_spi_host_to_card() is really implemented by the
>> platform SPI driver whether or not it needs to sleep; the mainloop is
>> holding a spinlock with interrupts disabled.
>>     
>
> Generally speaking, we have a GPIO-based IRQ that informs the host
> that one or more of the following have occured:
> - the card has data for the host
> - the card has an event for the host
> - the card is ready for a command from the host (also, the last
> command, if any, is handled)
> - the card is ready for data from the host
>
> We need to know that the card is ready for a command or for data
> before we can write our buffer out, we therefore queue up outgoing
> command or data buffers.  On the RX side, we just want to handle the
> bottom half of the IRQ.  Right now, this is implemented by the one
> driver thread that does work based on what the card and libertas core
> are ready to do.
>
>   
>> The SPI controller may well have alignment restrictions that require
>> copying and realigning the data on certain boundaries; there's no
>> guarantee that priv->tx_pending_buf will be aligned on those boundaries.
>> It'll be aligned to _something_ since it's not a member of a packed
>> struct, but that's dependent on the compiler flags at build-time rather
>> than the actual hardware requirements.
>>     
>
> I wonder if we can resolve this now, following up on the previous
> discussion about alignment.  Aligning the tx_pending_buf and avoiding
> a memcpy would be nice, although it unfortunately depends on what the
> host SPI controller needs.
>
>   
>> All that said, I don't see anything offhand that would prevent
>> tx_pending_buf being used directly by those interface drivers that need
>> it.  The device's queue should be blocked until the card has finished
>> processing the packet in lbs_send_tx_feedback().  Need to verify that in
>> all the corner cases though (like error conditions) that the main stack
>> won't overwrite priv->tx_pending_buf before the card is done with it.
>>
>> Second, are you sure the memcpy is the bottleneck?  ie, is that memcpy
>> the bottleneck, or are there other bottlenecks in the driver or your SPI
>> controller code that are causing problems?
>>     
>
> We have some evidence that the driver thread in question is a
> bottle-neck, at least on Blackfin.  I'm investigating further but the
> memcpy is a definite candidate.  Keith -- can you elaborate on your
> findings?  What is your test setup?  Are you using an ARM or a
> Blackfin CPU?
>
> Thanks,
>
>   -Andrey
>
>   
>>> For interrupt event, we can just do a scheduled work to read the event cause.
>>> Keith Mok
>>>
>>> _______________________________________________
>>> libertas-dev mailing list
>>> libertas-dev at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/libertas-dev
>>>       
>> _______________________________________________
>> libertas-dev mailing list
>> libertas-dev at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/libertas-dev
>>
>>     




More information about the libertas-dev mailing list