[RESEND RFC PATCH v1 2/5] arm64: Add BBM Level 2 cpu feature
Marc Zyngier
maz at kernel.org
Thu Dec 12 06:26:59 PST 2024
On Thu, 12 Dec 2024 10:55:45 +0000,
Ryan Roberts <ryan.roberts at arm.com> wrote:
>
> On 12/12/2024 08:25, Marc Zyngier wrote:
> >> +
> >> + local_flush_tlb_all();
> >
> > The elephant in the room: if TLBs are in such a sorry state, what
> > guarantees we can make it this far?
>
> I'll leave Miko to respond to your other comments, but I wanted to address this
> one, since it's pretty fundamental. We went around this loop internally and
> concluded that what we are doing is architecturally sound.
>
> The expectation is that a conflict abort can only be generated as a result of
> the change in patch 4 (and patch 5). That change makes it possible for the TLB
> to end up with a multihit. But crucially that can only happen for user space
> memory because that change only operates on user memory. And while the TLB may
> detect the conflict at any time, the conflict abort is only permitted to be
> reported when an architectural access is prevented by the conflict. So we never
> do anything that would allow a conflict for a kernel memory access and a user
> memory conflict abort can never be triggered as a result of accessing kernel memory.
>
> Copy/pasting comment from AlexC on the topic, which explains it better than I can:
>
> """
> The intent is certainly that in cases where the hardware detects a TLB conflict
> abort, it is only permitted to report it (by generating an exception) if it
> applies to an access that is being attempted architecturally. ... that property
> can be built from the following two properties:
>
> 1. The TLB conflict can only be reported as an Instruction Abort or a Data Abort
>
> 2. Those two exception types must be reported synchronously and precisely.
> """
I totally agree with this. The issue is that nothing says that the
abort is in any way related to userspace.
> >
> > I honestly don't think you can reliably handle a TLB Conflict abort in
> > the same translation regime as the original fault, given that we don't
> > know the scope of that fault. You are probably making an educated
> > guess that it is good enough on the CPUs you know of, but I don't see
> > anything in the architecture that indicates the "blast radius" of a
> > TLB conflict.
>
> OK, so I'm claiming that the blast radius is limited to the region of memory
> that we are operating on in contpte_collapse() in patch 4. Perhaps we need to go
> re-read the ARM and come back with the specific statements that led us to that
> conclusion?
But we don't know for sure what caused this conflict by the time we
arrive in the handler. It could equally be because we have a glaring
bug somewhere on the kernel side, even if you are *now* only concerned
with userspace.
If anything, this should absolutely check for FAR_EL1 and assert that
this is indeed caused by such change.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
More information about the linux-arm-kernel
mailing list