[PATCH v2] arm64: errata: Add NXP iMX8QM workaround for A53 Cache coherency issue

Ivan T. Ivanov iivanov at suse.de
Mon Jun 12 03:22:37 PDT 2023


On 06-12 10:33, Mark Rutland wrote:
> On Thu, Jun 08, 2023 at 09:22:54PM +0300, Ivan T. Ivanov wrote:
> > On 06-08 16:32, Mark Rutland wrote:
> > > On Thu, Jun 08, 2023 at 06:05:54PM +0300, Ivan T. Ivanov wrote:
> > > > On 06-08 15:16, Mark Rutland wrote:
> > > > > On Thu, Jun 08, 2023 at 04:39:29PM +0300, Ivan T. Ivanov wrote:
> > > > > > On 06-02 11:34, Will Deacon wrote:
> > > > > > > On Thu, Apr 20, 2023 at 02:29:52PM +0300, Ivan T. Ivanov wrote:
> > > 
> > > > > > > > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > > > > > > > index 4a79ba100799..265b6334291b 100644
> > > > > > > > --- a/arch/arm64/kernel/traps.c
> > > > > > > > +++ b/arch/arm64/kernel/traps.c
> > > > > > > > @@ -556,6 +556,11 @@ static void user_cache_maint_handler(unsigned long esr, struct pt_regs *regs)
> > > > > > > >  		__user_cache_maint("dc civac", address, ret);
> > > > > > > >  		break;
> > > > > > > >  	case ESR_ELx_SYS64_ISS_CRM_IC_IVAU:	/* IC IVAU */
> > > > > > > > +		if (cpus_have_final_cap(ARM64_WORKAROUND_NXP_ERR050104)) {
> > > > > > > > +			asm volatile("ic ialluis");
> > > > > > > 
> > > > > > > Hmm, one oddity here is that you can pass a faulting address and not see
> > > > > > > the fault. It looks like that's already IMP DEF, so it's probably ok, but
> > > > > > > might be worth a comment.
> > > > > > 
> > > > > > I am not sure what should be expected behavior, but I could
> > > > > > add comment, sure.
> > > > > 
> > > > > Another option is to make this:
> > > > > 
> > > > > 	 case ESR_ELx_SYS64_ISS_CRM_IC_IVAU:     /* IC IVAU */
> > > > > 	 	__user_cache_maint("ic ivau", address, ret)
> > > > > 		if (cpus_have_final_cap(ARM64_WORKAROUND_NXP_ERR050104) && !ret)
> > > > > 			asm volatile("ic ialluis");
> > > > > 		break;
> > > > > 
> > > > > Which'll ensure that if the regular IC IVAU faults we'll handle that, and if
> > > > > not we'll do the IC IALLUIS.
> > > > > 
> > > > > I think that looks a bit cleaner, too.
> > > > 
> > > > What I am afraid is that "ic ivau address" will do cache invalidation of a 
> > > > random address, because of wrong address wiring. So the end result will not
> > > > be what should be expected.
> > > 
> > > I don't follow:
> > > 
> > >  - The faulting logic is entirely local (and has to happen before the
> > >    broadcast), so the faulting logic shouldn't be affected by the erratum and
> > >    will use the correct address.
> > 
> > Perhaps is all me really not understanding PoU, so please 
> > bear with me :-). This SoC hasi two CPU clusters.
> > 
> > 4x Cortex A53
> > • 32KB-I and 32KB-D cache
> > • 1MB Shared L2 Cache
> > 
> > 2x Cortex A72
> > • 48KB-I and 32KB-D cache
> > • 1MB Shared L2 Cache
> > 
> > If it is all local then how far this broadcast will go? Up to the
> > L2 cache? But then errata says that: ".. maintenance operations 
> > exchanged between the A53 and A72 core clusters.. " are affected.
> > So in case of IC IAVU there is no interaction between clusters?
> 
> I'm saying that the *faulting logic* is local, not the broadcast.
> 
> IIUC the erratum is due to the wiring between clusters losing some bits. The
> *recipient* of a broadcast DVM message (which is what TLBI or IC instructions
> will generate) will receive a bogus address (and other context) to use for the
> invalidate.
> 
> The CPU which executes the `IC IVAU <Xt>` instruction will check the address in
> `<Xt>` using its local MMU to determine whether the access should fault
> *before* sending the broadcast DVM message, and the recipients will not perform
> any MMU check (since e.g. they could be running a different process with a
> different VA space anyway).
> 
> The MMU check is local to the CPU, and doesn't depend on any broadcast; that
> should be unaffected by the erratum. If that is affected then the erratum
> description is wrong and we have a bigger set of problems.
> 

Thanks, I think I get it. Now, what will be preferred way to fix IC
TLBI instructions for this errata? Using static_key for TLBI and
alternatives for IC instruction or something else?

Keep trapping userspace IC instruction seems ok to me. Perhaps
alternative for IC in invalidate_icache_by_line macro? About
TLBI, it will be really invasive if static_key is used, I think.
Yes, there is over invalidation, but still overall performance of
SoC is almost doubled because of enabled 2 CA72 :-)

Another point is, should I keep hunk in arm-smmu.c, because
cpus_have_final_cap() is not generally available for drivers?

Regards,
Ivan




More information about the linux-arm-kernel mailing list