[PATCH v2] arm64: errata: Add NXP iMX8QM workaround for A53 Cache coherency issue

Mon Jun 12 02:33:03 PDT 2023

On Thu, Jun 08, 2023 at 09:22:54PM +0300, Ivan T. Ivanov wrote:
> On 06-08 16:32, Mark Rutland wrote:
> > On Thu, Jun 08, 2023 at 06:05:54PM +0300, Ivan T. Ivanov wrote:
> > > On 06-08 15:16, Mark Rutland wrote:
> > > > On Thu, Jun 08, 2023 at 04:39:29PM +0300, Ivan T. Ivanov wrote:
> > > > > On 06-02 11:34, Will Deacon wrote:
> > > > > > On Thu, Apr 20, 2023 at 02:29:52PM +0300, Ivan T. Ivanov wrote:
> > 
> > > > > > > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > > > > > > index 4a79ba100799..265b6334291b 100644
> > > > > > > --- a/arch/arm64/kernel/traps.c
> > > > > > > +++ b/arch/arm64/kernel/traps.c
> > > > > > > @@ -556,6 +556,11 @@ static void user_cache_maint_handler(unsigned long esr, struct pt_regs *regs)
> > > > > > >  		__user_cache_maint("dc civac", address, ret);
> > > > > > >  		break;
> > > > > > >  	case ESR_ELx_SYS64_ISS_CRM_IC_IVAU:	/* IC IVAU */
> > > > > > > +		if (cpus_have_final_cap(ARM64_WORKAROUND_NXP_ERR050104)) {
> > > > > > > +			asm volatile("ic ialluis");
> > > > > > 
> > > > > > Hmm, one oddity here is that you can pass a faulting address and not see
> > > > > > the fault. It looks like that's already IMP DEF, so it's probably ok, but
> > > > > > might be worth a comment.
> > > > > 
> > > > > I am not sure what should be expected behavior, but I could
> > > > > add comment, sure.
> > > > 
> > > > Another option is to make this:
> > > > 
> > > > 	 case ESR_ELx_SYS64_ISS_CRM_IC_IVAU:     /* IC IVAU */
> > > > 	 	__user_cache_maint("ic ivau", address, ret)
> > > > 		if (cpus_have_final_cap(ARM64_WORKAROUND_NXP_ERR050104) && !ret)
> > > > 			asm volatile("ic ialluis");
> > > > 		break;
> > > > 
> > > > Which'll ensure that if the regular IC IVAU faults we'll handle that, and if
> > > > not we'll do the IC IALLUIS.
> > > > 
> > > > I think that looks a bit cleaner, too.
> > > 
> > > What I am afraid is that "ic ivau address" will do cache invalidation of a 
> > > random address, because of wrong address wiring. So the end result will not
> > > be what should be expected.
> > 
> > I don't follow:
> > 
> >  - The faulting logic is entirely local (and has to happen before the
> >    broadcast), so the faulting logic shouldn't be affected by the erratum and
> >    will use the correct address.
> 
> Perhaps is all me really not understanding PoU, so please 
> bear with me :-). This SoC hasi two CPU clusters.
> 
> 4x Cortex A53
> • 32KB-I and 32KB-D cache
> • 1MB Shared L2 Cache
> 
> 2x Cortex A72
> • 48KB-I and 32KB-D cache
> • 1MB Shared L2 Cache
> 
> If it is all local then how far this broadcast will go? Up to the
> L2 cache? But then errata says that: ".. maintenance operations 
> exchanged between the A53 and A72 core clusters.. " are affected.
> So in case of IC IAVU there is no interaction between clusters?

I'm saying that the *faulting logic* is local, not the broadcast.

IIUC the erratum is due to the wiring between clusters losing some bits. The
*recipient* of a broadcast DVM message (which is what TLBI or IC instructions
will generate) will receive a bogus address (and other context) to use for the
invalidate.

The CPU which executes the `IC IVAU <Xt>` instruction will check the address in
`<Xt>` using its local MMU to determine whether the access should fault
*before* sending the broadcast DVM message, and the recipients will not perform
any MMU check (since e.g. they could be running a different process with a
different VA space anyway).

The MMU check is local to the CPU, and doesn't depend on any broadcast; that
should be unaffected by the erratum. If that is affected then the erratum
description is wrong and we have a bigger set of problems.

> >  - We're going to do an IC IALLUIS anyway, so we're going to do invalidation of
> >    *every* address. So randomly invalidating a random address at the same time
> >    isn't going to cause a functiona problem because we're going to invaldiate
> >    it anyway.
> 
> Yes, but my point was that this random address might generate a translation
> error where there shouldn't be one.

As above, I don't believe that it can. The CPU executing the 'IC IVAU' checks
its local MMU without broadcast, and so this should not be affected. All
recipients perform no check and cannot generate a translation fault as a result
of receiving a broadcast DVM message for I$ invalidation.

> > > > > > Finally, how come you don't need to upgrade I-cache invalidation by-VA
> > > > > > in the kernel? It looks like you're only handling operations trapped
> > > > > > from EL0.
> > > > > 
> > > > > Hm, I was thinking that __tlbi() is taking care for this or you mean 
> > > > > something else, like locations in assembler.h?
> > > > 
> > > > The __tlbi macro handles only TLBI instructions.
> > > > 
> > > > The trap handler above *only* handles IC instructions trapped from userspace;
> > > 
> > > Yep, I get this.
> > > 
> > > > we have IC IVAU instructions elsewhere in the kernel (e.g.
> > > > arch/arm64/mm/cache.S).
> > > 
> > > But I have missed this one :-)
> > 
> > Looking again, those all use the common invalidate_icache_by_line macro in
> > arch/arm64/include/asm/assembler.h
> > 
> > Lukily that seems to be all:
> > 
> > | [mark at lakrids:~/src/linux]% git grep -iw ivau -- arch/arm64 
> > | arch/arm64/include/asm/assembler.h:     ic      ivau, \tmp2                     // invalidate I line PoU
> > | arch/arm64/kernel/traps.c:      case ESR_ELx_SYS64_ISS_CRM_IC_IVAU:     /* IC IVAU */
> > | arch/arm64/kernel/traps.c:              __user_cache_maint("ic ivau", address, ret);
> > 
> > > I think that this is working because these are used only in operations witch
> > > work up to the Point of Unification, thus not messing up with caches of the
> > > rest of PE.
> > 
> > I think you have a misunderstanding of the architecture, because that doesn't
> > make sense. All IC IVAU operations operate to the Point of Unification. That's
> > what the 'U' in 'IVAU' stands for. The Point of Unification is the point in the
> > memory system where instruction fetches and data fetches see the same thing.
> >  
> > If the VA passed to IC IVAU isn't correctly broadcast, it's broken regardless
> > of where that IC IVAU is executed from, because it will leave stale
> > instructions in the I-caches of other PEs.
> 
> I am totally confused :-) If this is true, and I am not saying it is not,
> then we can't expect that IC IVAU will generate correct translation might
> fault, no?

As above, I don't believe this can affect how IC IVAU will generate translation
faults.

Thanks,
Mark.