shared memory problem on ARM v5TE using threads

Nicolas Pitre nico at fluxnic.net
Mon Dec 7 12:33:20 EST 2009


On Mon, 7 Dec 2009, Russell King - ARM Linux wrote:

> On Mon, Dec 07, 2009 at 10:37:35AM -0500, Nicolas Pitre wrote:
> > On Mon, 7 Dec 2009, Russell King - ARM Linux wrote:
> > 
> > > On Mon, Dec 07, 2009 at 02:55:52PM +0200, Ronen Shitrit wrote:
> > > > > Russell King - ARM Linux wrote:
> > > > > > On Mon, Dec 07, 2009 at 01:31:41PM +0200, saeed bishara wrote:
> > > > > [...]
> > > > > >>> If there's no problem with C=0 B=1 mappings on Kirkwood, I've no idea
> > > > > >>> what's going on, and I don't have any suggestion on what to try next.
> > > > > >>>
> > > > > >>> The log shows that the kernel is doing the right thing: when we detect
> > > > > >>> two mappings for the same page in the same MM space, we clean and
> > > > > >>> invalidate any existing cacheable mappings visible in the MM space
> > > > > >>> (both L1 and L2), and switch all visible mappings to C=0 B=1 mappings.
> > > > > >>> This makes the area non-cacheable.
> > > > > >> what about the PTE of the MM space of the write process? if it remains
> > > > > >> C=1 B=1, then it's data will be at the L2, and as the L2 is not
> > > > > >> flushed on context switch, then that explains this behavior.
> > > > > > 
> > > > > > That's probably the issue, and it means that _all_ shared writable
> > > > > > mappings on your processor will be broken.
> > > > > 
> > > > > Hmm.. I tried also the testprg with CACHE_FEROCEON_L2 deaktivated,
> > > > > same result ...
> > > > > 
> > > > > > Oh dear, that really is bad news.
> > > > > 
> > > > > Indeed.
> > > > > 
> > > > > > There are two solutions to this which I can currently think of:
> > > > > > 1. flush the L2 cache on every context switch
> > > > > 
> > > > > To clarify, the testprg runs fine, if I start 4 processes each with
> > > > > only one read thread. In this case all works as expected. The mess
> > > > > begins only, if one read process starts more than one read thread ...
> > > > > 
> > > > That also match the theory:
> > > > When using different processes, the shared area will stay C=1 B=1, 
> > > > On each context switch L1 will be flushed,
> > > > Since L2 is PIPT next process will get the correct data...
> > > 
> > > Hang on - if L2 is PIPT, then there shouldn't be a problem provided it's
> > > searched with C=0 B=1 mappings.  Is that the case?
> > 
> > I don't have the time to properly wrap my brain around the current issue 
> > at the moment.  However there are 3 facts to account for:
> > 
> > 1) Only 2 ARMv5 CPU variants with L2 cache exist: Feroceon and XSC3.
> >    However this issue should affect both equally.
> > 
> > 2) L2 cache is PIPT in both cases.
> > 
> > 3) From commit 08e445bd6a which fixed such a similar issue on Feroceon 
> >    and XSC3:
> > 
> >     Ideally, we would make L1 uncacheable and L2 cacheable as L2 is PIPT. But
> >     Feroceon does not support that combination, and the TEX=5 C=0 B=0 encoding
> >     for XSc3 doesn't appear to work in practice.
> 
> Sigh, why do people create this kind of hardware brokenness.
> 
> It seems the original commit (08e445bd6a) only partly addresses the problem;
> it's broken in so many other ways, as is highlighted by this test case.
> Was it originally created for Xscale3 or Feroceon?  Was the problem actually
> found to exist on Xscale3 and Feroceon?

It fixed a test case that was discovered on XSC3 and turned up to be 
valid on Feroceon as well.  I probably have the source for it somewhere.  
The case was multiple mmap() of the same memory area within the same 
process.  I think (but that needs confirmation) that this fixed a real 
life db4 issue as well.

> Any read or write via another cacheable mapping will result in the L2
> being loaded with data.  One instance is as shown in the original posters
> test program - where a shared writable mapping exists in another process.
> 
> Another case would be having a shared writable mapping, and using read()/
> write() on the mapped file.  This is normally taken care of with
> flush_dcache_page(), but this does not do any L2 cache maintainence on
> Feroceon.

I thought those were already handled by making L1 uncacheable (and L2 
cleaned) as soon as a second user of a shared mapping was 
encountered.

> Another case is any kind of mmap() of the same file - in other words, it
> doesn't have to be another shared mmap to bring data into the L2 cache.

But that case is fine, no?  L2 being PIPT you get the same cached data 
for both mappings, and a write will COW the page.

> Now, at first throught, if we disable the cache for all shared writable
> mappings in addition to what we're already doing, does this solve the
> problem?  Well, it means that the writes will bypass the caches and hit
> the RAM directly.  The reads from the other shared mappings will read
> direct from the RAM.
> 
> A private mapping using the same page will use the same page, and it
> will not be marked uncacheable.  Accesses to it will draw data into the
> L2 cache.

Hmmm...

> PIO kernel mode accesses will also use the cached copy, and that _is_
> a problem - it means when we update the backing file on disk, we'll
> write out the L2 cached data rather than what really should be written
> out - the updated data from the writable shared mappings.
> 
> So it seems that at least these affected CPUs need flush_dcache_page()
> to also do L2 cache maintainence.  I don't think that's enough to cover
> all cases though - it probably also needs to do L2 cache maintainence
> in all the other flush_cache_* functions as well.

/me starts to feel the head ache

> This is something that should be benchmarked on the affected CPUs and
> compared with the unmodified code with L2 cache disabled.
> 
> 
> As a side note, I'm currently concerned that the sequence:
> 
> 	mmap(MAP_SHARED);
> 	write to shared mapping;
> 	msync(MS_SYNC);
> 
> may not result in the written data hitting the disk (due to missing a
> cache flush) but as yet I'm unable to prove it.  Since I now get lost
> reading the Linux VFS/MM code, I can't prove this by code inspection.
> 
> Checking for this isn't going to be easy - (a) munmapping the region
> will cause the data to hit RAM, (b) any context switch will cause the
> data to hit RAM, (c) merely reading back the file via read() will
> trigger flush_dcache_page()...  Need some way to externally monitor
> what gets written to the storage device...
> 


Nicolas



More information about the linux-arm-kernel mailing list