[PATCH kvmtool 04/21] mmio: Extend handling to include ioport emulation
Alexandru Elisei
alexandru.elisei at arm.com
Mon Feb 22 10:50:08 EST 2021
Hi Andre,
On 2/17/21 5:43 PM, Andre Przywara wrote:
> On Thu, 11 Feb 2021 16:10:16 +0000
> Alexandru Elisei <alexandru.elisei at arm.com> wrote:
>
> Hi,
>
>> On 12/10/20 2:28 PM, Andre Przywara wrote:
>>> In their core functionality MMIO and I/O port traps are not really
>>> different, yet we still have two totally separate code paths for
>>> handling them. Devices need to decide on one conduit or need to provide
>>> different handler functions for each of them.
>>>
>>> Extend the existing MMIO emulation to also cover ioport handlers.
>>> This just adds another RB tree root for holding the I/O port handlers,
>>> but otherwise uses the same tree population and lookup code.
>> Maybe I'm missing something, but why two trees? Is it valid to have an overlap
>> between IO port and MMIO emulation? Or was it done to make the removal of ioport
>> emulation easier?
> So I thought about it as well, but figured it's easier this way:
> - x86 allows overlap, PIO is a totally separate address space from
> memory/MMIO. Early x86 CPUs had pins to indicate a PIO bus cycle, but
> using the same address and data pins otherwise. In practise there
> might be no overlap when it comes to *MMIO* traps vs PIO on x86
> (there is DRAM only at the lowest 64K of the IBM PC memory map),
> but not sure we should rely on this.
> - For non-x86 this would indeed be non-overlapping, but this would need
> to be translated at init time then? And then we can't move those
> anymore, I guess? So I found it cleaner to keep this separate, and do
> the translation at trap time.
> - As a consequence we would need to have a bit indicating the address
> space. I haven't actually tried this, but my understanding is that
> this would spoil the whole rb_tree functions, since they rely on a
> linear addressing scheme, and adding another bit there would be at
> least cumbersome?
>
> At the end I decided to go for separate trees, as also this was less
> change.
>
> I agree that it would be nice to have one tree, from a design point of
> view, but I am afraid that would require more changes.
> If need be, I think we can always unify them later on, on top of this
> series?
Definitely later, I forgot that x86 uses special instructions to access IO ports,
which means that port addresses can overlap with memory addresses. Let's keep 2
trees for now and we can decide later if we should unify them for the other
architectures.
>
>> If it's not valid to have that overlap, then I think having one tree for both
>> would better. Struct mmio_mapping would have to be augmented with a flags field
>> that holds the same flags given to kvm__register_iotrap to differentiate between
>> the two slightly different emulations. Saving the IOTRAP_COALESCE flag would also
>> make it trivial to call KVM_UNREGISTER_COALESCED_MMIO in kvm__deregister_iotrap,
>> which we currently don't do.
>>
>>> "ioport" or "mmio" just become a flag in the registration function.
>>> Provide wrappers to not break existing users, and allow an easy
>>> transition for the existing ioport handlers.
>>>
>>> This also means that ioport handlers now can use the same emulation
>>> callback prototype as MMIO handlers, which means we have to migrate them
>>> over. To allow a smooth transition, we hook up the new I/O emulate
>>> function to the end of the existing ioport emulation code.
>> I'm sorry, but I don't understand that last sentence. Do you mean that the ioport
>> emulation code has been modified to use kvm__emulate_pio() as a fallback for when
>> the port is not found in the ioport_tree?
> I meant that for the transition period we have all of traditional MMIO,
> traditional PIO, *and* just transformed PIO.
>
> That means there are still PIO devices registered through ioport.c's
> ioport__register(), *and* PIO devices registered through mmio.c's
> kvm__register_pio(). Which means they end up in two separate PIO trees.
> And only the traditional kvm__emulate_io() from ioport.c is called upon
> a trap, so it needs to check both trees, which it does by calling into
> kvm__emulate_pio(), shall a search in the local tree fail.
Thank you for the explanation, now it makes sense.
>
> Or did you mean something else?
>
>>> Signed-off-by: Andre Przywara <andre.przywara at arm.com>
>>> ---
>>> include/kvm/kvm.h | 42 +++++++++++++++++++++++++++++----
>>> ioport.c | 4 ++--
>>> mmio.c | 59 +++++++++++++++++++++++++++++++++++++++--------
>>> 3 files changed, 89 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
>>> index ee99c28e..14f9d58b 100644
>>> --- a/include/kvm/kvm.h
>>> +++ b/include/kvm/kvm.h
>>> @@ -27,10 +27,16 @@
>>> #define PAGE_SIZE (sysconf(_SC_PAGE_SIZE))
>>> #endif
>>>
>>> +#define IOTRAP_BUS_MASK 0xf
>> It's not immediately obvious what this mask does. It turns out it's used to mask
>> the enum flags defined in the header devices.h, header which is not included in
>> this file.
>>
>> The flag names we pass to kvm__register_iotrap() are slightly inconsistent
>> (DEVICE_BUS_PCI, DEVICE_BUS_MMIO and IOTRAP_COALESCE), where DEVICE_BUS_{PCI,
>> MMIO} come from devices.h as an enum. I was wondering if I'm missing something and
>> there is a particular reason why we don't define our own flags for that here
>> (something like IOTRAP_PIO and IOTRAP_MMIO).
> I am not sure why this would be needed?
> We already define and use DEVICE_BUS_x elsewhere, so why not re-use it?
To check if we should coalesce the MMIO regions we check the IOTRAP_COALESCE bit,
but to check if we should use the mmio or io tree we use a mask over the first 4
bits and compare that to DEVICE_BUS_xxx. I find that confusing: the
IOTRAP_BUS_MASK and IOTRAP_COALESCE are defined here, but there is no evidence
where the other flags are coming from, and the header file devices.h isn't even
included.
>
>> If we do decide to keep the flags from devices.h, I think it would be worth it to
>> have a compile time check (with BUILD_BUG_ON) that IOTRAP_BUS_MASK is >=
>> DEVICES_BUS_MAX, which would also be a good indication of where those flags are
>> coming from.
> Well, if that makes you happy, I am not sure we gain another 13 bus
> types anytime soon, though ;-)
I was actually suggesting that more for documenting the code. If we compare
IOTRAP_BUS_MASK with DEVICE_BUS_MAX, then that will give us a hint where we're
expecting the flags to be defined.
But I had another look at the code and it seems that DEVICE_BUS_MMIO = 1 and
DEVICE_BUS_IOPORT = 2, which means that we don't need the mask and we can check
the bits instead (and IOTRAP_COALESCE can be redefined to be 1U << 2). I think
dropping the mask and replacing it with testing individual bits would make the
code easier to follow, what do you think?
>
>>> +#define IOTRAP_COALESCE (1U << 4)
>>> +
>>> #define DEFINE_KVM_EXT(ext) \
>>> .name = #ext, \
>>> .code = ext
>>>
>>> +struct kvm_cpu;
>>> +typedef void (*mmio_handler_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data,
>>> + u32 len, u8 is_write, void *ptr);
>>> typedef void (*fdt_irq_fn)(void *fdt, u8 irq, enum irq_type);
>>>
>>> enum {
>>> @@ -113,6 +119,8 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level);
>>> void kvm__irq_trigger(struct kvm *kvm, int irq);
>>> bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction, int size, u32 count);
>>> bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u8 is_write);
>>> +bool kvm__emulate_pio(struct kvm_cpu *vcpu, u16 port, void *data,
>>> + int direction, int size, u32 count);
>>> int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr);
>>> int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void *userspace_addr,
>>> enum kvm_mem_type type);
>>> @@ -136,10 +144,36 @@ static inline int kvm__reserve_mem(struct kvm *kvm, u64 guest_phys, u64 size)
>>> KVM_MEM_TYPE_RESERVED);
>>> }
>>>
>>> -int __must_check kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
>>> - void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
>>> - void *ptr);
>>> -bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr);
>>> +int __must_check kvm__register_iotrap(struct kvm *kvm, u64 phys_addr, u64 len,
>>> + mmio_handler_fn mmio_fn, void *ptr,
>>> + unsigned int flags);
>>> +
>>> +static inline
>>> +int __must_check kvm__register_mmio(struct kvm *kvm, u64 phys_addr,
>>> + u64 phys_addr_len, bool coalesce,
>>> + mmio_handler_fn mmio_fn, void *ptr)
>>> +{
>>> + return kvm__register_iotrap(kvm, phys_addr, phys_addr_len, mmio_fn, ptr,
>>> + DEVICE_BUS_MMIO | (coalesce ? IOTRAP_COALESCE : 0));
>>> +}
>>> +static inline
>>> +int __must_check kvm__register_pio(struct kvm *kvm, u16 port, u16 len,
>>> + mmio_handler_fn mmio_fn, void *ptr)
>>> +{
>>> + return kvm__register_iotrap(kvm, port, len, mmio_fn, ptr,
>>> + DEVICE_BUS_IOPORT);
>>> +}
>>> +
>>> +bool kvm__deregister_iotrap(struct kvm *kvm, u64 phys_addr, unsigned int flags);
>>> +static inline bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
>>> +{
>>> + return kvm__deregister_iotrap(kvm, phys_addr, DEVICE_BUS_MMIO);
>>> +}
>>> +static inline bool kvm__deregister_pio(struct kvm *kvm, u16 port)
>>> +{
>>> + return kvm__deregister_iotrap(kvm, port, DEVICE_BUS_IOPORT);
>>> +}
>>> +
>>> void kvm__reboot(struct kvm *kvm);
>>> void kvm__pause(struct kvm *kvm);
>>> void kvm__continue(struct kvm *kvm);
>>> diff --git a/ioport.c b/ioport.c
>>> index b98836d3..204d8103 100644
>>> --- a/ioport.c
>>> +++ b/ioport.c
>>> @@ -147,7 +147,8 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
>>>
>>> entry = ioport_get(&ioport_tree, port);
>>> if (!entry)
>>> - goto out;
>>> + return kvm__emulate_pio(vcpu, port, data, direction,
>>> + size, count);
>> I have to admit this gave me pause because this patch doesn't add any users for
>> kvm__register_pio() (although with this change the behaviour of kvm__emulate_io()
>> remains exactly the same). Do you think this change would fit better in patch #7,
>> where the first user for kvm__register_pio() is added, or do you prefer it here?
> I think it logically belongs here, as we introduce the
> kvm__emulate_pio() function here as well. Otherwise this function would
> have no caller. As it is now, it's just a "coincidence" that no one
> actually called kvm__register_pio() so far. This also makes the other
> patches movable and replaceable: this patch prepares the stage, the
> follow-up patches just fill it.
Sure, makes sense.
>
>>>
>>> ops = entry->ops;
>>>
>>> @@ -162,7 +163,6 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16
>>> port, void *data, int direction,
>>> ioport_put(&ioport_tree, entry);
>>>
>>> -out:
>>> if (ret)
>>> return true;
>>>
>>> diff --git a/mmio.c b/mmio.c
>>> index cd141cd3..4cce1901 100644
>>> --- a/mmio.c
>>> +++ b/mmio.c
>>> @@ -19,13 +19,14 @@ static DEFINE_MUTEX(mmio_lock);
>>>
>>> struct mmio_mapping {
>>> struct rb_int_node node;
>>> - void (*mmio_fn)(struct kvm_cpu
>>> *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr);
>>> + mmio_handler_fn mmio_fn;
>>> void *ptr;
>>> u32 refcount;
>>> bool remove;
>>> };
>>>
>>> static struct rb_root mmio_tree = RB_ROOT;
>>> +static struct rb_root pio_tree = RB_ROOT;
>>>
>>> static struct mmio_mapping *mmio_search(struct rb_root *root, u64
>>> addr, u64 len) {
>>> @@ -103,9 +104,9 @@ static void mmio_put(struct kvm *kvm, struct
>>> rb_root *root, struct mmio_mapping mutex_unlock(&mmio_lock);
>>> }
>>>
>>> -int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64
>>> phys_addr_len, bool coalesce,
>>> - void (*mmio_fn)(struct kvm_cpu *vcpu, u64
>>> addr, u8 *data, u32 len, u8 is_write, void *ptr),
>>> - void *ptr)
>>> +int kvm__register_iotrap(struct kvm *kvm, u64 phys_addr, u64
>>> phys_addr_len,
>>> + mmio_handler_fn mmio_fn, void *ptr,
>>> + unsigned int flags)
>>> {
>>> struct mmio_mapping *mmio;
>>> struct kvm_coalesced_mmio_zone zone;
>>> @@ -127,7 +128,7 @@ int kvm__register_mmio(struct kvm *kvm, u64
>>> phys_addr, u64 phys_addr_len, bool c .remove = false,
>>> };
>>>
>>> - if (coalesce) {
>>> + if (flags & IOTRAP_COALESCE) {
>> There is no such flag being used in ioport.c, is it valid to have the
>> flags DEVICE_BUS_IOPORT and IOTRAP_COALESCE set at the same time?
> Well, yes and no: Yes, as this maps to MMIO on non-x86, so
> theoretically could use the flag. No, as no one registering a trap
> handler through kvm__register_pio() would ever have the chance to set
> this flag.
> I can check for the registration being for the MMIO bus before entering
> the "if" branch, if that is what you mean?
Yes, that's what I mean, please return an error if a device tries to register an
I/O port and sets the COALESCE flag, since that's forbidden for architectures with
true I/O ports (like x86). I realize it might look peculiar since we don't have
any devices that do that, but I think it can be useful for catching errors when
writing new devices.
Thanks,
Alex
>
>>> zone = (struct kvm_coalesced_mmio_zone) {
>>> .addr = phys_addr,
>>> .size = phys_addr_len,
>>> @@ -139,18 +140,27 @@ int kvm__register_mmio(struct kvm *kvm, u64
>>> phys_addr, u64 phys_addr_len, bool c }
>>> }
>>> mutex_lock(&mmio_lock);
>>> - ret = mmio_insert(&mmio_tree, mmio);
>>> + if ((flags & IOTRAP_BUS_MASK) == DEVICE_BUS_IOPORT)
>>> + ret = mmio_insert(&pio_tree, mmio);
>>> + else
>>> + ret = mmio_insert(&mmio_tree, mmio);
>>> mutex_unlock(&mmio_lock);
>>>
>>> return ret;
>>> }
>>>
>>> -bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
>>> +bool kvm__deregister_iotrap(struct kvm *kvm, u64 phys_addr,
>>> unsigned int flags) {
>>> struct mmio_mapping *mmio;
>>> + struct rb_root *tree;
>>> +
>>> + if ((flags & IOTRAP_BUS_MASK) == DEVICE_BUS_IOPORT)
>>> + tree = &pio_tree;
>>> + else
>>> + tree = &mmio_tree;
>>>
>>> mutex_lock(&mmio_lock);
>>> - mmio = mmio_search_single(&mmio_tree, phys_addr);
>>> + mmio = mmio_search_single(tree, phys_addr);
>>> if (mmio == NULL) {
>>> mutex_unlock(&mmio_lock);
>>> return false;
>>> @@ -167,7 +177,7 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64
>>> phys_addr)
>>> * called mmio_put(). This will trigger use-after-free
>>> errors on VCPU0. */
>>> if (mmio->refcount == 0)
>>> - mmio_deregister(kvm, &mmio_tree, mmio);
>>> + mmio_deregister(kvm, tree, mmio);
>>> else
>>> mmio->remove = true;
>>> mutex_unlock(&mmio_lock);
>>> @@ -175,7 +185,8 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64
>>> phys_addr) return true;
>>> }
>>>
>>> -bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8
>>> *data, u32 len, u8 is_write) +bool kvm__emulate_mmio(struct kvm_cpu
>>> *vcpu, u64 phys_addr, u8 *data,
>>> + u32 len, u8 is_write)
>> I don't think style changes should be part of this patch, the patch
>> is large enough as it is.
> I see, I just figured it's not worth a separate patch either.
>
> Cheers,
> Andre
>
>>> {
>>> struct mmio_mapping *mmio;
>>>
>>> @@ -194,3 +205,31 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu,
>>> u64 phys_addr, u8 *data, u32 len, u out:
>>> return true;
>>> }
>>> +
>>> +bool kvm__emulate_pio(struct kvm_cpu *vcpu, u16 port, void *data,
>>> + int direction, int size, u32 count)
>>> +{
>>> + struct mmio_mapping *mmio;
>>> + bool is_write = direction == KVM_EXIT_IO_OUT;
>>> +
>>> + mmio = mmio_get(&pio_tree, port, size);
>>> + if (!mmio) {
>>> + if (vcpu->kvm->cfg.ioport_debug) {
>>> + fprintf(stderr, "IO error: %s port=%x,
>>> size=%d, count=%u\n",
>>> + to_direction(direction), port,
>>> size, count); +
>>> + return false;
>>> + }
>>> + return true;
>>> + }
>>> +
>>> + while (count--) {
>>> + mmio->mmio_fn(vcpu, port, data, size, is_write,
>>> mmio->ptr); +
>>> + data += size;
>>> + }
>>> +
>>> + mmio_put(vcpu->kvm, &pio_tree, mmio);
>>> +
>>> + return true;
>>> +}
More information about the linux-arm-kernel
mailing list