No subject


Fri Nov 6 13:01:15 EST 2009


if (bits > 8) {
   this16
} else {
   that8
}
so no loss in splitting it. You need two copies of the read and write
loops in the read_write routine, though it's only 8 extra lines of
code for a 1% CPU increase.
I don't have strong feelings about this it way.

> Yes.  Too bad we can't also combine multiple messages that might be in
>  the queue...

I wonder if it might be easier, if there are several parts to a
message, to just cook up  one big new transfer, copy the data into it
and just do that. It would mean copying every data block of course
since in the MMC/SD case they are 512+2+1.

>  But, if this gets to complicated it might not be worth holding off
>  getting it merged
Possibly. It depends on the timing of the merge windows. The current
code works with some devices.

>  > It now occurs to me that another step in CPU efficiency could be had
>  > by abusing the receive-fifo-is-half-full interrupt to signal the
>  > completion of a transfer. This would only work for transfers of four
>  > or more words and would need some careful jiggery-pokery at end of
>  > transfer to turn the tx-fifo-is-half-empty interrupt enable off and
>  > ensure that exactly four words would end up in the RX FIFO.
>
> That might work but I think it could break horribly.  The "jiggery-pokery"
>  could end up being pretty messy.

Yes. It is looking nasty and it would make a mess of the place where
the continuity stuff needs to happen to achieve actual functionality
improvements.

To get some idea of the potential wins I've been instrumenting the
code today to see how many interrupts it takes to receive the last 4
words
At 3.7MHz I'm seeing up to 3 interrupts waiting for the final draining
to happen, with an average of 1.36 (1 would be the perfect figure)
At 400kHz it takes up to 26 with an average of 9.92

Me, I only really care about the SD card case, where a half dozen
useless interrupts per 512-byte block is not going to impact on the
CPU usage much. In fact, by making the interrupt routine more complex
it may even end up being more cpu hungry in the end.

On the other hand, slow clock-speed devices are using needless high
CPU, especially if the transfer sizes are small as is typical with
command-response devices.

>  But, the tx-fifo-is-half-empty might be what is needed to handle the
>  multiple transfer merging.  We know the driver is finished transfering
>  the data to the fifo when (espi->tx == t->len).  At this point we are
>  just wating for the last (fifo_level) bytes to come in on the rx fifo,
>  this could be anything from 1 to 8 bytes.

Another instrumentation show that yes, at 3.7MHz, the data read on
each interrupt is between 1 and 8 words. I'm surprised it gets to 8
full - maybe when the interrupt is requested while some other IRQ
(ether? clock tick?) is already in progress? I'm instrumenting in a
way that shouldn't change the timing characteristics of the transfer
(ie a tiny printk's only after end of transfer)

>  > It's a horrible thought, and I suspect that the DMA engine is the real answer.
>
> The DMA engine might help but it's not here yet....

Yes. What I mean is that the prospect of that happening somewhere in
the future reduces the importance of doing hairy jiggery pokery with
the interrupt version.

     M



More information about the linux-arm-kernel mailing list