[PATCH 33/89] KVM: arm64: Handle guest stage-2 page-tables entirely at EL2
Huang, Shaoqin
shaoqin.huang at intel.com
Tue Jun 7 18:16:56 PDT 2022
On 6/1/2022 12:45 AM, Will Deacon wrote:
> On Fri, May 20, 2022 at 05:03:29PM +0100, Alexandru Elisei wrote:
>> On Thu, May 19, 2022 at 02:41:08PM +0100, Will Deacon wrote:
>>> Now that EL2 is able to manage guest stage-2 page-tables, avoid
>>> allocating a separate MMU structure in the host and instead introduce a
>>> new fault handler which responds to guest stage-2 faults by sharing
>>> GUP-pinned pages with the guest via a hypercall. These pages are
>>> recovered (and unpinned) on guest teardown via the page reclaim
>>> hypercall.
>>>
>>> Signed-off-by: Will Deacon <will at kernel.org>
>>> ---
>> [..]
>>> +static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>> + unsigned long hva)
>>> +{
>>> + struct kvm_hyp_memcache *hyp_memcache = &vcpu->arch.pkvm_memcache;
>>> + struct mm_struct *mm = current->mm;
>>> + unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
>>> + struct kvm_pinned_page *ppage;
>>> + struct kvm *kvm = vcpu->kvm;
>>> + struct page *page;
>>> + u64 pfn;
>>> + int ret;
>>> +
>>> + ret = topup_hyp_memcache(hyp_memcache, kvm_mmu_cache_min_pages(kvm));
>>> + if (ret)
>>> + return -ENOMEM;
>>> +
>>> + ppage = kmalloc(sizeof(*ppage), GFP_KERNEL_ACCOUNT);
>>> + if (!ppage)
>>> + return -ENOMEM;
>>> +
>>> + ret = account_locked_vm(mm, 1, true);
>>> + if (ret)
>>> + goto free_ppage;
>>> +
>>> + mmap_read_lock(mm);
>>> + ret = pin_user_pages(hva, 1, flags, &page, NULL);
>>
>> When I implemented memory pinning via GUP for the KVM SPE series, I
>> discovered that the pages were regularly unmapped at stage 2 because of
>> automatic numa balancing, as change_prot_numa() ends up calling
>> mmu_notifier_invalidate_range_start().
>>
>> I was curious how you managed to avoid that, I don't know my way around
>> pKVM and can't seem to find where that's implemented.
>
> With this series, we don't take any notice of the MMU notifiers at EL2
> so the stage-2 remains intact. The GUP pin will prevent the page from
> being migrated as the rmap walker won't be able to drop the mapcount.
>
> It's functional, but we'd definitely like to do better in the long term.
> The fd-based approach that I mentioned in the cover letter gets us some of
> the way there for protected guests ("private memory"), but non-protected
> guests running under pKVM are proving to be pretty challenging (we need to
> deal with things like sharing the zero page...).
>
> Will
My understanding is that with the pin_user_pages, the page that used by
guests (both protected and non-protected) will stay for a long time, and
the page will not be swapped or migrated. So no need to care about the
MMU notifiers. Is it right?
More information about the linux-arm-kernel
mailing list