[PATCH v2] arm64: errata: Add NXP iMX8QM workaround for A53 Cache coherency issue
Ivan T. Ivanov
iivanov at suse.de
Mon Jun 12 03:22:37 PDT 2023
On 06-12 10:33, Mark Rutland wrote:
> On Thu, Jun 08, 2023 at 09:22:54PM +0300, Ivan T. Ivanov wrote:
> > On 06-08 16:32, Mark Rutland wrote:
> > > On Thu, Jun 08, 2023 at 06:05:54PM +0300, Ivan T. Ivanov wrote:
> > > > On 06-08 15:16, Mark Rutland wrote:
> > > > > On Thu, Jun 08, 2023 at 04:39:29PM +0300, Ivan T. Ivanov wrote:
> > > > > > On 06-02 11:34, Will Deacon wrote:
> > > > > > > On Thu, Apr 20, 2023 at 02:29:52PM +0300, Ivan T. Ivanov wrote:
> > >
> > > > > > > > diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> > > > > > > > index 4a79ba100799..265b6334291b 100644
> > > > > > > > --- a/arch/arm64/kernel/traps.c
> > > > > > > > +++ b/arch/arm64/kernel/traps.c
> > > > > > > > @@ -556,6 +556,11 @@ static void user_cache_maint_handler(unsigned long esr, struct pt_regs *regs)
> > > > > > > > __user_cache_maint("dc civac", address, ret);
> > > > > > > > break;
> > > > > > > > case ESR_ELx_SYS64_ISS_CRM_IC_IVAU: /* IC IVAU */
> > > > > > > > + if (cpus_have_final_cap(ARM64_WORKAROUND_NXP_ERR050104)) {
> > > > > > > > + asm volatile("ic ialluis");
> > > > > > >
> > > > > > > Hmm, one oddity here is that you can pass a faulting address and not see
> > > > > > > the fault. It looks like that's already IMP DEF, so it's probably ok, but
> > > > > > > might be worth a comment.
> > > > > >
> > > > > > I am not sure what should be expected behavior, but I could
> > > > > > add comment, sure.
> > > > >
> > > > > Another option is to make this:
> > > > >
> > > > > case ESR_ELx_SYS64_ISS_CRM_IC_IVAU: /* IC IVAU */
> > > > > __user_cache_maint("ic ivau", address, ret)
> > > > > if (cpus_have_final_cap(ARM64_WORKAROUND_NXP_ERR050104) && !ret)
> > > > > asm volatile("ic ialluis");
> > > > > break;
> > > > >
> > > > > Which'll ensure that if the regular IC IVAU faults we'll handle that, and if
> > > > > not we'll do the IC IALLUIS.
> > > > >
> > > > > I think that looks a bit cleaner, too.
> > > >
> > > > What I am afraid is that "ic ivau address" will do cache invalidation of a
> > > > random address, because of wrong address wiring. So the end result will not
> > > > be what should be expected.
> > >
> > > I don't follow:
> > >
> > > - The faulting logic is entirely local (and has to happen before the
> > > broadcast), so the faulting logic shouldn't be affected by the erratum and
> > > will use the correct address.
> >
> > Perhaps is all me really not understanding PoU, so please
> > bear with me :-). This SoC hasi two CPU clusters.
> >
> > 4x Cortex A53
> > • 32KB-I and 32KB-D cache
> > • 1MB Shared L2 Cache
> >
> > 2x Cortex A72
> > • 48KB-I and 32KB-D cache
> > • 1MB Shared L2 Cache
> >
> > If it is all local then how far this broadcast will go? Up to the
> > L2 cache? But then errata says that: ".. maintenance operations
> > exchanged between the A53 and A72 core clusters.. " are affected.
> > So in case of IC IAVU there is no interaction between clusters?
>
> I'm saying that the *faulting logic* is local, not the broadcast.
>
> IIUC the erratum is due to the wiring between clusters losing some bits. The
> *recipient* of a broadcast DVM message (which is what TLBI or IC instructions
> will generate) will receive a bogus address (and other context) to use for the
> invalidate.
>
> The CPU which executes the `IC IVAU <Xt>` instruction will check the address in
> `<Xt>` using its local MMU to determine whether the access should fault
> *before* sending the broadcast DVM message, and the recipients will not perform
> any MMU check (since e.g. they could be running a different process with a
> different VA space anyway).
>
> The MMU check is local to the CPU, and doesn't depend on any broadcast; that
> should be unaffected by the erratum. If that is affected then the erratum
> description is wrong and we have a bigger set of problems.
>
Thanks, I think I get it. Now, what will be preferred way to fix IC
TLBI instructions for this errata? Using static_key for TLBI and
alternatives for IC instruction or something else?
Keep trapping userspace IC instruction seems ok to me. Perhaps
alternative for IC in invalidate_icache_by_line macro? About
TLBI, it will be really invasive if static_key is used, I think.
Yes, there is over invalidation, but still overall performance of
SoC is almost doubled because of enabled 2 CA72 :-)
Another point is, should I keep hunk in arm-smmu.c, because
cpus_have_final_cap() is not generally available for drivers?
Regards,
Ivan
More information about the linux-arm-kernel
mailing list