[PATCH 1/2] ARM: kvm: define PAGE_S2_DEVICE as read-only by default

Marc Zyngier maz at misterjones.org
Sun Sep 14 02:09:17 PDT 2014

On 2014-09-14 05:49, Ard Biesheuvel wrote:
> On 13 September 2014 19:06, Christoffer Dall
> <christoffer.dall at linaro.org> wrote:
>> On Sat, Sep 13, 2014 at 01:15:45PM +0200, Ard Biesheuvel wrote:
>>> On 13 September 2014 12:41, Marc Zyngier <marc.zyngier at arm.com> 
>>> wrote:
>>> > Hi Ard,
>>> >
>>> > On 2014-09-13 11:17, Ard Biesheuvel wrote:
>>> >>
>>> >> Now that we support read-only memslots, we need to make sure 
>>> that
>>> >> pass-through device mappings are not mapped writable if the 
>>> guest
>>> >> has requested them to be read-only. The existing implementation
>>> >> already honours this by calling kvm_set_s2pte_writable() on the 
>>> new
>>> >> pte in case of writable mappings, so all we need to do is define
>>> >> the default pgprot_t value used for devices to be PTE_S2_RDONLY.
>>> >>
>>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel at linaro.org>
>>> >
>>> >
>>> > I feel very uncomfortable with this change. Why would we map a 
>>> device RO? Is
>>> > that only for completeness sake?
>>> >
>>> We would map a device RO so that QEMU (or whatever is managing KVM)
>>> can emulate the writes. I don't have a clear cut use case, to be
>>> honest, but setting up a writable mapping for a memslot that was
>>> explicitly set up as read-only seems wrong in any case.
>> Agreed, if it doesn't ever make sense to do so, then we should 
>> return an
>> error to user space if userspace attempts such a configuration.  The
>> current code is just weird.
>>> Note that the particular problem I was seeing was primarily caused 
>>> by
>>> kvm_is_mmio_pfn()'s false positive on the zero page, but it 
>>> unveiled
>>> this particular issue as well.
>>> > Note that we also use PAGE_S2_DEVICE for things that are not 
>>> mapped through
>>> > a memslot, such as the GIC.
>>> >
>>> Yes, and I realize now that this breaks it.
>>> My apologies: I have an additional patch locally that sets up MMIO
>>> ranges in one go instead of faulting them in one page at a time as 
>>> we
>>> do now, and there the read-write case is handled correctly in
>>> kvm_phys_addr_ioremap(). However, I thought it was better to send
>>> these out separately first, but apparently not.
>> I think it is better to change this separately, and then add the 
>> ioremap
>> stuff.  However, you need to change all places that call 
>> and expect a RDWR memory region, this happens to be only
>> kvm_phys_addr_ioremap() for now.
>>> So if we can agree on whether or not MMIO backed mappings should be
>>> read-write even if the memslot says no, I will follow up with a 
>>> proper
>>> series if there are still changes required.
>> I guess it could be theoretically useful to have read-only device 
>> memory
>> regions, and I can't think of why it would violate the architecture.
> We have to handle it either way. But after reading D4.5.3 (Table
> D4-40) of the ARM ARM, I am wondering why we needed patch 
> b88657674d39
> "ARM: KVM: user_mem_abort: support stage 2 MMIO page mapping" in the
> first place, and we could just revert that patch and everything would
> work as expected. (In short, D4.5.3 says that MT_DEVICE trumps
> MT_NORMAL, so if the stage 1 translation is MT_DEVICE, it doesn't
> matter what memtype the S2 translation specifies)

We've been there before:

>> That said, I don't have any more clear use cases in mind, and we
>> definitely shouldn't just silently ignore the read-only flag from 
>> user
>> space and make the region writeable.  If we don't want to allow this
>> behavior, we can return an error in kvm_arch_create_memslot(), which
>> will cause the KVM_CREATE_USER_MEMORY_REGION ioctl to return 
> Well, I am not sure how easy it is to find out beforehand (i.e., at
> ioctl time) what the nature of the backing is, and you have to deal
> with hva_to_pfn() potentially returning a VM_PFNMAP pfn or
> PageReserved pages anyway.
> So just mapping everything as MT_NORMAL actually seems like the 
> sanest
> thing to do, so imo we should revert the patch. The only other
> question I had is whether it would be better to map a MMIO region in
> one go, but we can discuss that separately.

Aside from the MT_NORMAL thing, the only saving we'd get by dynamically 
maping MMIO regions would be the page tables. Not very useful in my 


Who you jivin' with that Cosmik Debris?

More information about the linux-arm-kernel mailing list