[PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write
Ricardo Koller
ricarkol at google.com
Thu Dec 22 17:00:03 PST 2022
On Thu, Dec 22, 2022 at 08:58:40PM +0000, Oliver Upton wrote:
> On Thu, Dec 22, 2022 at 09:01:15AM +0000, Marc Zyngier wrote:
> > On Wed, 21 Dec 2022 17:46:24 +0000, Oliver Upton <oliver.upton at linux.dev> wrote:
> > > - When UFFD is in use, translation faults are reported to userspace as
> > > writes when from a RW memslot and reads when from an RO memslot.
> >
> > Not quite: translation faults are reported as reads if TCR_EL1.HA
> > isn't set, and as writes if it is. Ignoring TCR_EL1.HD for a moment,
> > this matches exactly the behaviour of the page-table walker, which
> > will update the S1 PTs only if this bit is set.
>
> My bad, yes you're right. I conflated the use case here with the
> architectural state.
>
> I'm probably being way too pedantic, but I just wanted to make sure we
> agree about the ensuing subtlety. More below:
>
> > Or is it what userfaultfd does on its own? That'd be confusing...
> >
> > >
> > > - S1 page table memory is spuriously marked as dirty, as we presume a
> > > write immediately follows the translation fault. That isn't entirely
> > > senseless, as it would mean both the target page and the S1 PT that
> > > maps it are both old. This is nothing new I suppose, just weird.
> >
> > s/old/young/ ?
> >
> > I think you're confusing the PT access with the access that caused the
> > PT access (I'll have that printed on a t-shirt, thank you very much).
>
> I'd buy it!
>
> > Here, we're not considering the cause of the PT access anymore. If
> > TCR_EL1.HA is set, the S1 PT page will be marked as accessed even on a
> > read, and only that page.
>
> I think this is where the disconnect might be. TCR_EL1.HA == 1 suggests
> a write could possibly follow, but I don't think it requires it. The
> page table walker must first load the S1 PTE before writing to it.
>
> From AArch64.S1Translate() (DDI0487H.a):
>
> (fault, descaddress, walkstate, descriptor) = AArch64.S1Walk(fault, walkparams, va, regime,
> ss, acctype, iswrite, ispriv);
>
> [...]
>
> new_desc = descriptor;
> if walkparams.ha == '1' && AArch64.FaultAllowsSetAccessFlag(fault) then
> // Set descriptor AF bit
> new_desc<10> = '1';
>
> [...]
>
> // Either the access flag was clear or AP<2> is set
> if new_desc != descriptor then
> if regime == Regime_EL10 && EL2Enabled() then
> s1aarch64 = TRUE;
> s2fs1walk = TRUE;
> aligned = TRUE;
> iswrite = TRUE;
> (s2fault, descupdateaddress) = AArch64.S2Translate(fault, descaddress, s1aarch64,
> ss, s2fs1walk, AccType_ATOMICRW,
> aligned, iswrite, ispriv);
>
> if s2fault.statuscode != Fault_None then
> return (s2fault, AddressDescriptor UNKNOWN);
> else
> descupdateaddress = descaddress;
>
> (fault, mem_desc) = AArch64.MemSwapTableDesc(fault, descriptor, new_desc,
> walkparams.ee, descupdateaddress)
>
> Buried in AArch64.S1Walk() is a stage-2 walk for a read to fetch the
> descriptor. The second stage-2 walk for write is conditioned on having
> already fetched the stage-1 descriptor and determining the AF needs
> to be set.
>
> Relating back to UFFD: if we expect KVM to do exactly what hardware
> does, UFFD should see an attempted read when the first walk fails
> because of an S2 translation fault. Based on this patch, though, we'd
> promote it to a write if TCR_EL1.HA == 1.
>
> This has the additional nuance of marking the S1 PT's IPA as dirty, even
> though it might not actually have been written to. Having said that,
> the false positive rate should be negligible given that S1 PTs ought to
> account for a small amount of guest memory.
Another false positive is TCR_EL1.HA == 1 and having the AF bit set in
the PTE. This results on a write, when I don't think it should.
>
> Like I said before, I'm probably being unnecessarily pedantic :) It just
> seems to me that the view we're giving userspace of S1PTW aborts isn't
> exactly architectural and I want to make sure that is explicitly
> intentional.
>
> --
> Thanks,
> Oliver
More information about the linux-arm-kernel
mailing list