[PATCH 1/2] ARM: kvm: define PAGE_S2_DEVICE as read-only by default

Sun Sep 14 15:57:59 PDT 2014

On 14 September 2014 11:43, Ard Biesheuvel <ard.biesheuvel at linaro.org> wrote:
> On 14 September 2014 11:09, Marc Zyngier <maz at misterjones.org> wrote:
>> On 2014-09-14 05:49, Ard Biesheuvel wrote:
>>>
>>> On 13 September 2014 19:06, Christoffer Dall
>>> <christoffer.dall at linaro.org> wrote:
>>>>
>>>> On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote:
>>>>>
>>>>> On 13 September 2014 12:41, Marc Zyngier <marc.zyngier at arm.com> wrote:
>>>>> > Hi Ard,
>>>>> >
>>>>> > On 2014-09-13 11:17, Ard Biesheuvel wrote:
>>>>> >>
>>>>> >> Now that we support read-only memslots, we need to make sure that
>>>>> >> pass-through device mappings are not mapped writable if the guest
>>>>> >> has requested them to be read-only. The existing implementation
>>>>> >> already honours this by calling kvm_set_s2pte_writable() on the new
>>>>> >> pte in case of writable mappings, so all we need to do is define
>>>>> >> the default pgprot_t value used for devices to be PTE_S2_RDONLY.
>>>>> >>
>>>>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
>>>>> >
>>>>> >
>>>>> > I feel very uncomfortable with this change. Why would we map a device
>>>>> > RO? Is
>>>>> > that only for completeness sake?
>>>>> >
>>>>>
>>>>> We would map a device RO so that QEMU (or whatever is managing KVM)
>>>>> can emulate the writes. I don't have a clear cut use case, to be
>>>>> honest, but setting up a writable mapping for a memslot that was
>>>>> explicitly set up as read-only seems wrong in any case.
>>>>
>>>>
>>>> Agreed, if it doesn't ever make sense to do so, then we should return an
>>>> error to user space if userspace attempts such a configuration.  The
>>>> current code is just weird.
>>>>
>>>>>
>>>>> Note that the particular problem I was seeing was primarily caused by
>>>>> kvm_is_mmio_pfn()'s false positive on the zero page, but it unveiled
>>>>> this particular issue as well.
>>>>>
>>>>> > Note that we also use PAGE_S2_DEVICE for things that are not mapped
>>>>> > through
>>>>> > a memslot, such as the GIC.
>>>>> >
>>>>>
>>>>> Yes, and I realize now that this breaks it.
>>>>> My apologies: I have an additional patch locally that sets up MMIO
>>>>> ranges in one go instead of faulting them in one page at a time as we
>>>>> do now, and there the read-write case is handled correctly in
>>>>> kvm_phys_addr_ioremap(). However, I thought it was better to send
>>>>> these out separately first, but apparently not.
>>>>
>>>>
>>>> I think it is better to change this separately, and then add the ioremap
>>>> stuff.  However, you need to change all places that call PAGE_S2_DEVICE
>>>> and expect a RDWR memory region, this happens to be only
>>>> kvm_phys_addr_ioremap() for now.
>>>>
>>>>>
>>>>> So if we can agree on whether or not MMIO backed mappings should be
>>>>> read-write even if the memslot says no, I will follow up with a proper
>>>>> series if there are still changes required.
>>>>>
>>>>
>>>> I guess it could be theoretically useful to have read-only device memory
>>>> regions, and I can't think of why it would violate the architecture.
>>>>
>>>
>>> We have to handle it either way. But after reading D4.5.3 (Table
>>> D4-40) of the ARM ARM, I am wondering why we needed patch b88657674d39
>>> "ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping" in the
>>> first place, and we could just revert that patch and everything would
>>> work as expected. (In short, D4.5.3 says that MT_DEVICE trumps
>>> MT_NORMAL, so if the stage 1 translation is MT_DEVICE, it doesn't
>>> matter what memtype the S2 translation specifies)
>>
>>
>> We've been there before:
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/004420.html
>>
>
> Ah right. So why did those patches not make it in?
>

Never mind. I read the whole thread this time.

So, in summary, there is a concern that a malicious guest may request
a cachable mapping for a device range, in an attempt to manipulate the
VGIC or other device memory of another VM.
I think that concern only applies to writable mappings, so perhaps we
should just change

if (kvm_is_mmio_pfn(pfn))

to

if (kvm_is_mmio_pfn(pfn) && writable)

and be done with it (which is coincidentally the very first naive fix
I suggested for the issue i was seeing)
That way, we never map read-only MMIO regions writable, and rely on
the MT_DEVICE trumps MT_NORMAL rule to ensure the guest reads to those
regions are uncached.
(Wouldn't hurt to add a comment to explain it, I suppose)

-- 
Ard.

>>>> That said, I don't have any more clear use cases in mind, and we
>>>> definitely shouldn't just silently ignore the read-only flag from user
>>>> space and make the region writeable.  If we don't want to allow this
>>>> behavior, we can return an error in kvm_arch_create_memslot(), which
>>>> will cause the KVM_CREATE_USER_MEMORY_REGION ioctl to return -ENOMEM.
>>>>
>>>
>>> Well, I am not sure how easy it is to find out beforehand (i.e., at
>>> ioctl time) what the nature of the backing is, and you have to deal
>>> with hva_to_pfn() potentially returning a VM_PFNMAP pfn or
>>> PageReserved pages anyway.
>>> So just mapping everything as MT_NORMAL actually seems like the sanest
>>> thing to do, so imo we should revert the patch. The only other
>>> question I had is whether it would be better to map a MMIO region in
>>> one go, but we can discuss that separately.
>>
>>
>> Aside from the MT_NORMAL thing, the only saving we'd get by dynamically
>> maping MMIO regions would be the page tables. Not very useful in my opinion.
>>
>
> OK, so you agree faulting it in entirely upon the first abort is a
> sane thing to do then.
> So 1 patch to change that and 1 to revert the PAGE_S2_DEVICE thing then?