[PATCH] KVM: arm64: Adjust range correctly during host stage-2 faults

Quentin Perret qperret at google.com
Thu Mar 5 05:13:40 PST 2026


On Thursday 05 Mar 2026 at 10:55:42 (+0000), Marc Zyngier wrote:
> On Wed, 04 Mar 2026 18:55:04 +0000,
> Marc Zyngier <maz at kernel.org> wrote:
> > 
> > On Wed, 25 Jun 2025 11:55:48 +0100,
> > Quentin Perret <qperret at google.com> wrote:
> > > 
> > > host_stage2_adjust_range() tries to find the largest block mapping that
> > > fits within a memory or mmio region (represented by a kvm_mem_range in
> > > this function) during host stage-2 faults under pKVM. To do so, it walks
> > > the host stage-2 page-table, finds the faulting PTE and its level, and
> > > then progressively increments the level until it finds a granule of the
> > > appropriate size. However, the condition in the loop implementing the
> > > above is broken as it checks kvm_level_supports_block_mapping() for the
> > > next level instead of the current, so pKVM may attempt to map a region
> > > larger than can be covered with a single block.
> > > 
> > > This is not a security problem and is quite rare in practice (the
> > > kvm_mem_range check usually forces host_stage2_adjust_range() to choose a
> > > smaller granule), but this is clearly not the expected behaviour.
> > > 
> > > Refactor the loop to fix the bug and improve readability.
> > > 
> > > Fixes: c4f0935e4d95 ("KVM: arm64: Optimize host memory aborts")
> > > Signed-off-by: Quentin Perret <qperret at google.com>
> > 
> > This patch prevents my O6 board from booting in protected mode as of
> > e728e705802fe. Reverting it on top of 7.0-rc2 make the box work again.
> > 
> > I haven't quite worked out why though. The hack below makes it work,
> > but implies that we can get ranges that are smaller than a page.  That
> > feels unlikely, but I'm not sure we can rule it out (the kernel page
> > size could be pretty large anyway).
> 
> Having spent a bit of time on this, I'm pretty sure this is the cause
> of the issue. The memblock tables are as such:
> 
> maz at cosmic-debris:~/vminstall$ sudo cat /sys/kernel/debug/memblock/memory
>    0: 0x0000000080000000..0x00000000843fffff    0 NOMAP
>    1: 0x0000000084400000..0x00000000845fffff    0 NONE
>    2: 0x0000000085000000..0x000000009fffffff    0 NONE
>    3: 0x00000000a0000000..0x00000000a7ffffff    0 NOMAP
>    4: 0x00000000a8000000..0x00000000fffbffff    0 NONE
>    5: 0x00000000fffc0000..0x00000000fffeffff    0 NOMAP
>    6: 0x00000000ffff0000..0x00000000ffffdfff    0 NONE
>    7: 0x00000000ffffe000..0x00000000ffffffff    0 NOMAP
>    8: 0x0000000100000000..0x00000007fe4effff    0 NONE
>    9: 0x00000007fe4f0000..0x00000007fedeffff    0 NOMAP
>   10: 0x00000007fedf0000..0x00000007ffffffff    0 NONE
>   11: 0x0000008000000000..0x000000807a290fff    0 NONE
>   12: 0x000000807a291000..0x000000807a2927b2    0 NOMAP
>   13: 0x000000807a2927b3..0x000000807fffffff    0 NONE

Ouch, these last few are 'interesting', oh well :-)

> Any access to page 0x000000807a292000 is going to blow up in your
> face, because there is no way you can map this and still respect the
> memblock boundary. Same thing for any region that is smaller than
> PAGE_SIZE, or not aligned on PAGE_SIZE. Which is even more annoying.
> 
> I'm starting to think that my hack is not that idiotic in the end...

Yes, I can't think of anything better TBH. We've already asserted that
we don't have an annotated PTE here, and at the last level we're
guaranteed not to accidentally map a neighbouring private region, so yes
we should just proceed with a page-aligned mapping there.

Want me to post a proper patch or do you already have one in stock?

Thanks!
Quentin



More information about the linux-arm-kernel mailing list