[PATCH 07/14] KVM: arm64: Restrict host access to the ITS tables
Sebastian Ene
sebastianene at google.com
Fri Apr 10 06:52:54 PDT 2026
On Mon, Mar 16, 2026 at 04:13:59PM +0000, Fuad Tabba wrote:
Hello Fuad,
> Hi Sebastian,
>
> On Tue, 10 Mar 2026 at 12:49, Sebastian Ene <sebastianene at google.com> wrote:
> >
> > Setup shadow structures for ITS indirect tables held in
> > the GITS_BASER<n> registers.
> > Make the last level of the Device Table and vPE Table
> > inacessible to the host.
>
> inacessible -> inaccessible
Applied fix, thanks.
> > In a direct layout configuration, donate the table to
> > the hypervisor since the software is not expected to
> > program them directly.
>
> This commit message is too brief and doesn't fully explain the
> problem, the impact, and the mechanism of the solution. It also
> appears to contradict the actual code changes.
>
> For example, could you elaborate why must the last level of indirect
> tables be inaccessible?
For device table, a malicious host can write the ITT address that points
to hyp memory and then use MAPTI to write over that memory.
>
> Can you also please explain the mechanism? You are parsing
> GITS_BASER_INDIRECT to determine if a shadow Level 1 table must be
> shared with the host, while unconditionally donating the original
> physical tables. You also explicitly exclude Collection tables. The
> msg should briefly justify why Collection tables are safe to leave
> accessible to the host.
>
> There is also a contradiction in the message. You state "In a direct
> layout configuration, donate the table...". However, your code donates
> the original hardware table unconditionally on every iteration of the
> loop, regardless of whether GITS_BASER_INDIRECT is set. Please ensure
> the commit log accurately reflects the code implementation.
>
I see no contradition, I only need to shadow the first layer of the
indirect tables. Shadowing implies donation and sharing:
because we are donating the original tables from host -> hyp and we sharing the
host view of the tables with the hypervisor (which is a copy).
> Maybe you could say that the problem is Host DMA attacks via ITS table
> manipulation. Whereas the mechanism is to unconditionally donate
This has nothing to do with Host DMA attacks, it is just the AP that can
write to memory.
> hardware tables to EL2. For indirect Device/vPE tables, share a L1
> shadow table with the host and strictly donate the L2 pages to prevent
> the host from writing malicious L2 pointers.
>
> >
> > Signed-off-by: Sebastian Ene <sebastianene at google.com>
> > ---
> > arch/arm64/kvm/hyp/nvhe/its_emulate.c | 143 ++++++++++++++++++++++++++
> > 1 file changed, 143 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/hyp/nvhe/its_emulate.c b/arch/arm64/kvm/hyp/nvhe/its_emulate.c
> > index 4a3ccc90a1a9..865a5d6353ed 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/its_emulate.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/its_emulate.c
> > @@ -141,6 +141,145 @@ static struct pkvm_protected_reg *get_region(phys_addr_t dev_addr)
> > return NULL;
> > }
> >
> > +static int pkvm_host_unmap_last_level(void *shadow, size_t num_pages, u32 psz)
> > +{
> > + u64 *table = shadow;
> > + int ret, i, end = (num_pages << PAGE_SHIFT) / sizeof(table);
> > + phys_addr_t table_addr;
>
> RCT, mixing initialized variables and uninitialized variables, plus
> variables of conceptually different "types" in the same declaration.
>
> Please use sizeof(*table): sizeof(table) evaluates to the size of the
> pointer (8 bytes), NOT the size of the array element. In this case,
> this happens to be the same, but it's still wrong.
>
> Maybe the following is clearer:
> + int end = num_pages * (PAGE_SIZE / sizeof(*table));
>
>
Will use the suggestion and do the same for the
pkvm_host_map_last_level.
> > +
> > + for (i = 0; i < end; i++) {
> > + if (!(table[i] & GITS_BASER_VALID))
> > + continue;
> > +
> > + table_addr = table[i] & PHYS_MASK;
> > + ret = __pkvm_host_donate_hyp(hyp_phys_to_pfn(table_addr), psz >> PAGE_SHIFT);
>
> The ITS-configured page size and the host page size could be
> different, but the number of pages to donate for Level 2 tables is
> calculated based on psz (the ITS).
>
> If the ITS hardware is configured for 4KB pages, but the host kernel
> is using (e.g.,) 64KB pages, psz >> PAGE_SHIFT evaluates to 0.
I need to revisit this, thanks for pointing out.
>
> You need to account for mismatched page sizes, perhaps by using
> DIV_ROUND_UP(psz, PAGE_SIZE) (or something similar) to ensure the
> containing host page is donated.
>
> > + if (ret)
> > + goto err_donate;
> > + }
> > +
> > + return 0;
> > +err_donate:
> > + for (i = i - 1; i >= 0; i--) {
>
> Please use the while (i--) idiom for rollback loops.
>
>
> > + if (!(table[i] & GITS_BASER_VALID))
> > + continue;
> > +
> > + table_addr = table[i] & PHYS_MASK;
> > + __pkvm_hyp_donate_host(hyp_phys_to_pfn(table_addr), psz >> PAGE_SHIFT);
>
> Please wrap this in WARN_ON(...). If donating back to the host fails
> during a rollback, we have a fatal page leak that needs to be loudly
> flagged, similar to how you handle it in pkvm_unshare_shadow_table.
>
>
> > + }
> > + return ret;
> > +}
> > +
> > +static int pkvm_share_shadow_table(void *shadow, u64 nr_pages)
> > +{
> > + u64 i, ret, start_pfn = hyp_virt_to_pfn(shadow);
>
> Same comment as before with RCT and the mixing of declarations.
>
>
> > +
> > + for (i = 0; i < nr_pages; i++) {
> > + ret = __pkvm_host_share_hyp(start_pfn + i);
> > + if (ret)
> > + goto unshare;
> > + }
> > +
> > + ret = hyp_pin_shared_mem(shadow, shadow + (nr_pages << PAGE_SHIFT));
> > + if (ret)
> > + goto unshare;
> > +
> > + return ret;
> > +unshare:
>
> Please use the while (i--) idiom for rollback loops.
>
> Also, please use consistent naming conventions for the labels. Here
> you call it unshare, and earlier it was err_donate.
>
>
> > + for (i = i - 1; i >= 0; i--)
> > + __pkvm_host_unshare_hyp(start_pfn + i);
> > + return ret;
> > +}
> > +
> > +static void pkvm_unshare_shadow_table(void *shadow, u64 nr_pages)
> > +{
> > + u64 i, start_pfn = hyp_virt_to_pfn(shadow);
> > +
> > + hyp_unpin_shared_mem(shadow, shadow + (nr_pages << PAGE_SHIFT));
> > +
> > + for (i = 0; i < nr_pages; i++)
> > + WARN_ON(__pkvm_host_unshare_hyp(start_pfn + i));
> > +}
> > +
> > +static void pkvm_host_map_last_level(void *shadow, size_t num_pages, u32 psz)
> > +{
> > + u64 *table;
>
> RCT, and you forgot to initialize table:
> + u64 *table = shadow;
Fixed this, thanks. I never ended up on this code path during testing,
maybe I should create a test for it to trigger it.
>
> > + int i, end = (num_pages << PAGE_SHIFT) / sizeof(table);
>
> Same sizeof(table) pointer-size bug as above.
>
>
> > + phys_addr_t table_addr;
> > +
> > + for (i = 0; i < end; i++) {
> > + if (!(table[i] & GITS_BASER_VALID))
> > + continue;
> > +
> > + table_addr = table[i] & ~GITS_BASER_VALID;
>
> Inconsistent masking logic, since in pkvm_host_unmap_last_level you
> correctly used PHYS_MASK to extract the address, but here in the
> rollback path you use ~GITS_BASER_VALID.
>
> While both currently work because the upper bits and lower bits (below
> the page size) are defined as RES0 in the GIC spec, ~GITS_BASER_VALID
> is architecturally fragile. If a future hardware revision repurposes
> the upper RES0 bits [62:52] for new attributes (e.g., memory
> encryption flags), ~GITS_BASER_VALID will leak those attribute bits
> into the physical address calculation.
>
> Since PHYS_MASK correctly handles the address extraction across all
> page sizes (relying on the lower bits being RES0) and safely masks off
> future upper attribute bits, please standardize on using table_addr =
> table[i] & PHYS_MASK; for both functions.
>
>
Fixed the inconsistency and used PHYS_MASK everywhere.
> > + WARN_ON(__pkvm_hyp_donate_host(hyp_phys_to_pfn(table_addr), psz >> PAGE_SHIFT));
> > + }
> > +}
> > +
> > +static int pkvm_setup_its_shadow_baser(struct its_shadow_tables *shadow)
> > +{
> > + int i, ret;
> > + u64 baser_val, num_pages, type;
> > + void *base, *host_base;
> > +
> > + for (i = 0; i < GITS_BASER_NR_REGS; i++) {
> > + baser_val = shadow->tables[i].val;
> > + if (!(baser_val & GITS_BASER_VALID))
> > + continue;
> > +
> > + base = kern_hyp_va(shadow->tables[i].base);
> > + num_pages = (1 << shadow->tables[i].order);
> > +
> > + ret = __pkvm_host_donate_hyp(hyp_virt_to_pfn(base), num_pages);
> > + if (ret)
> > + goto err_donate;
> > +
> > + if (baser_val & GITS_BASER_INDIRECT) {
> > + host_base = kern_hyp_va(shadow->tables[i].shadow);
> > + ret = pkvm_share_shadow_table(host_base, num_pages);
> > + if (ret)
> > + goto err_with_donation;
> > +
> > + type = GITS_BASER_TYPE(baser_val);
> > + if (type == GITS_BASER_TYPE_COLLECTION)
> > + continue;
> > +
> > + ret = pkvm_host_unmap_last_level(base, num_pages,
> > + shadow->tables[i].psz);
> > + if (ret)
> > + goto err_with_share;
> > + }
> > + }
> > +
> > + return 0;
> > +err_with_share:
> > + pkvm_unshare_shadow_table(host_base, num_pages);
> > +err_with_donation:
> > + __pkvm_hyp_donate_host(hyp_virt_to_pfn(base), num_pages);
> > +err_donate:
> > + for (i = i - 1; i >= 0; i--) {
>
> Please use the while (i--) idiom for rollback loops.
>
>
> > + baser_val = shadow->tables[i].val;
> > + if (!(baser_val & GITS_BASER_VALID))
> > + continue;
> > +
> > + base = kern_hyp_va(shadow->tables[i].base);
> > + num_pages = (1 << shadow->tables[i].order);
> > +
> > + WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(base), num_pages));
>
> The sequence of rollback operations here creates a TOCTOU vulnerability.
>
There is a different problem here related to functionality rather than
this: donating the base to the host first and then iterating over it
will make the hypervisor explode. I fixed this.
> - First, you donate base (the Level 1 indirect table) back to the host.
> - Then, you pass base into pkvm_host_map_last_level().
> - Finally, pkvm_host_map_last_level() reads table[i] out of base to
> determine which Level 2 pages to donate back to the host.
>
> Because the host regains ownership of base _first_, it can be running
> concurrently on another CPU. A malicious host can overwrite the Level
> 1 table with pointers to arbitrary hypervisor-owned memory. The
> hypervisor will then read those malicious pointers and dutifully grant
> the host access to its own secure memory.
>
> The order of operations needs to be reversed: you must read base to
> roll back the L2 pages, unshare the shadow table, and *only then*
> donate base back to the host.
>
> Also, num_pages = (1 << shadow->tables[i].order); calculates a 32-bit
> signed integer because the literal 1 is a signed 32-bit int. If order
> is 31, this evaluates to a negative number. If order is 32 or higher,
> this is undefined behavior. Because num_pages is declared as a u64,
> you should use the standard kernel macro BIT_ULL().
>
> Here's my suggested fix (not tested). Reorder the operations to safely
> rollback L2 before donating L1, use the standard `while (i--)` loop,
> and fix the page calculation:
>
> + while (i--) {
> + baser_val = shadow->tables[i].val;
> + if (!(baser_val & GITS_BASER_VALID))
> + continue;
> +
> + base = kern_hyp_va(shadow->tables[i].base);
> + num_pages = BIT_ULL(shadow->tables[i].order);
> +
> + if (baser_val & GITS_BASER_INDIRECT) {
> + host_base = kern_hyp_va(shadow->tables[i].shadow);
> +
> + type = GITS_BASER_TYPE(baser_val);
> + if (type != GITS_BASER_TYPE_COLLECTION)
> + pkvm_host_map_last_level(base, num_pages,
> + shadow->tables[i].psz);
> +
> + pkvm_unshare_shadow_table(host_base, num_pages);
> + }
> +
> + WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(base), num_pages));
> + }
>
>
>
> > + if (baser_val & GITS_BASER_INDIRECT) {
> > + host_base = kern_hyp_va(shadow->tables[i].shadow);
> > + pkvm_unshare_shadow_table(host_base, num_pages);
> > +
> > + type = GITS_BASER_TYPE(baser_val);
> > + if (type == GITS_BASER_TYPE_COLLECTION)
> > + continue;
> > +
> > + pkvm_host_map_last_level(base, num_pages, shadow->tables[i].psz);
> > + }
> > + }
>
> You have duplicated the entire table decoding logic (calculating base,
> num_pages, checking INDIRECT...) down here in the rollback path.
> Consider abstracting "setup one table" and "teardown one table" into
> helper functions to make pkvm_setup_its_shadow_baser more readable and
> less prone to copy-pasta errors.
>
> Cheers,
> /fuad
>
Thanks,
Sebastian
>
> > +
> > + return ret;
> > +}
> > +
> > static int pkvm_setup_its_shadow_cmdq(struct its_shadow_tables *shadow)
> > {
> > int ret, i, num_pages;
> > @@ -205,6 +344,10 @@ int pkvm_init_gic_its_emulation(phys_addr_t dev_addr, void *host_priv_state,
> > if (ret)
> > goto err_with_shadow;
> >
> > + ret = pkvm_setup_its_shadow_baser(shadow);
> > + if (ret)
> > + goto err_with_shadow;
> > +
> > its_reg->priv = priv_state;
> >
> > hyp_spin_lock_init(&priv_state->its_lock);
> > --
> > 2.53.0.473.g4a7958ca14-goog
> >
More information about the linux-arm-kernel
mailing list