shared memory problem on ARM v5TE using threads

Nicolas Pitre nico at fluxnic.net
Mon Dec 7 10:37:35 EST 2009


On Mon, 7 Dec 2009, Russell King - ARM Linux wrote:

> On Mon, Dec 07, 2009 at 02:55:52PM +0200, Ronen Shitrit wrote:
> > > Russell King - ARM Linux wrote:
> > > > On Mon, Dec 07, 2009 at 01:31:41PM +0200, saeed bishara wrote:
> > > [...]
> > > >>> If there's no problem with C=0 B=1 mappings on Kirkwood, I've no idea
> > > >>> what's going on, and I don't have any suggestion on what to try next.
> > > >>>
> > > >>> The log shows that the kernel is doing the right thing: when we detect
> > > >>> two mappings for the same page in the same MM space, we clean and
> > > >>> invalidate any existing cacheable mappings visible in the MM space
> > > >>> (both L1 and L2), and switch all visible mappings to C=0 B=1 mappings.
> > > >>> This makes the area non-cacheable.
> > > >> what about the PTE of the MM space of the write process? if it remains
> > > >> C=1 B=1, then it's data will be at the L2, and as the L2 is not
> > > >> flushed on context switch, then that explains this behavior.
> > > > 
> > > > That's probably the issue, and it means that _all_ shared writable
> > > > mappings on your processor will be broken.
> > > 
> > > Hmm.. I tried also the testprg with CACHE_FEROCEON_L2 deaktivated,
> > > same result ...
> > > 
> > > > Oh dear, that really is bad news.
> > > 
> > > Indeed.
> > > 
> > > > There are two solutions to this which I can currently think of:
> > > > 1. flush the L2 cache on every context switch
> > > 
> > > To clarify, the testprg runs fine, if I start 4 processes each with
> > > only one read thread. In this case all works as expected. The mess
> > > begins only, if one read process starts more than one read thread ...
> > > 
> > That also match the theory:
> > When using different processes, the shared area will stay C=1 B=1, 
> > On each context switch L1 will be flushed,
> > Since L2 is PIPT next process will get the correct data...
> 
> Hang on - if L2 is PIPT, then there shouldn't be a problem provided it's
> searched with C=0 B=1 mappings.  Is that the case?

I don't have the time to properly wrap my brain around the current issue 
at the moment.  However there are 3 facts to account for:

1) Only 2 ARMv5 CPU variants with L2 cache exist: Feroceon and XSC3.
   However this issue should affect both equally.

2) L2 cache is PIPT in both cases.

3) From commit 08e445bd6a which fixed such a similar issue on Feroceon 
   and XSC3:

    Ideally, we would make L1 uncacheable and L2 cacheable as L2 is PIPT. But
    Feroceon does not support that combination, and the TEX=5 C=0 B=0 encoding
    for XSc3 doesn't appear to work in practice.

Hope this helps.


Nicolas



More information about the linux-arm-kernel mailing list