USB mass storage and ARM cache coherency

Russell King - ARM Linux linux at arm.linux.org.uk
Thu Mar 4 04:31:17 EST 2010


On Thu, Mar 04, 2010 at 07:54:57AM +0100, Wolfgang Mües wrote:
> ... and this is what *I* don't understand in this discussion. Obviously a 
> flush() in PIO drivers is a clean and quick solution to the problem. And how 
> much execution time will it cost - given the fact that if there is NO flush, 
> the flush operation will not be avoided, only delayed (up to the time the data 
> cache is doing the flush himself). If the data cache is doing the flush BEFORE 
> the data is used in userspace (this includes the most common case of reading 
> large files from the device), there will be no performance impact.

You're assuming that every page is used in the same way.  Here's some
examples where this is wrong:

1. A page is faulted in for an application, and it is a text page.
   - the data read in to the page needs to be visible to the instruction
     stream, so on Harvard architecture machines, this may require cache
     maintainence on both the D and I caches.

2. A page is faulted in for an application's data page.
   - data may be written to the kernel mapping, which may alias with the
     eventual userspace address.  These aliases need to be dealt with, to
     make the data visible to the user mapping of the page.

3. A page may be read in response to an application issuing a read(2) call.
   - the data is read from the kernel mapping, and isn't mapped into a
     userspace address.

So, in case (3), flushing the I and D caches could be completely wasteful
- consider if this file is a 600MB MPEG video file which is being read by
a video player.  There's no need to flush the I cache because MPEG data
will never be executed.  There's no need to flush the D cache because
there isn't a user mapping of that data yet, and therefore there aren't
any aliases.

In case (2), it would be wasteful to flush the I cache - the application
isn't going to execute the data.

In case (1), everything is required to ensure that the instruction stream
can see the instructions.

So, the PG_arch_1 'delayed flush' is not only about delaying flushes until
they're required, it's about eliminating those which are not required to
give additional system performance - maybe to the point where you can
serve MP3 files via NFS with a low enough latency that your player isn't
regularly starved of data because of all the needless flushing going on.



More information about the linux-arm-kernel mailing list