[PATCH] arm64/kvm: correct the error report in kvm_handle_guest_abort

Marc Zyngier maz at kernel.org
Mon Jan 18 08:26:06 EST 2021


On 2021-01-18 13:04, Jianyong Wu wrote:
> Hi Marc,
> 
>> -----Original Message-----
>> From: kvmarm-bounces at lists.cs.columbia.edu <kvmarm-
>> bounces at lists.cs.columbia.edu> On Behalf Of Jianyong Wu
>> Sent: Saturday, January 16, 2021 4:47 PM
>> To: Marc Zyngier <maz at kernel.org>
>> Cc: Justin He <Justin.He at arm.com>; nd <nd at arm.com>; will at kernel.org;
>> kvmarm at lists.cs.columbia.edu; linux-arm-kernel at lists.infradead.org
>> Subject: RE: [PATCH] arm64/kvm: correct the error report in
>> kvm_handle_guest_abort
>> 
>> Hi Marc,
>> 
>> > -----Original Message-----
>> > From: Marc Zyngier <maz at kernel.org>
>> > Sent: Friday, January 15, 2021 7:21 PM
>> > To: Jianyong Wu <Jianyong.Wu at arm.com>
>> > Cc: James Morse <James.Morse at arm.com>; will at kernel.org; Suzuki
>> Poulose
>> > <Suzuki.Poulose at arm.com>; linux-arm-kernel at lists.infradead.org;
>> > kvmarm at lists.cs.columbia.edu; Steve Capper <Steve.Capper at arm.com>;
>> > Justin He <Justin.He at arm.com>; nd <nd at arm.com>
>> > Subject: Re: [PATCH] arm64/kvm: correct the error report in
>> > kvm_handle_guest_abort
>> >
>> > On 2021-01-15 09:30, Jianyong Wu wrote:
>> > > Currently, error report when cache maintenance at read-only memory
>> > > range, like rom, is not clear enough and even not correct. As the
>> > > specific error is definitely known by kvm, it is obliged to give it
>> > > out.
>> > >
>> > > Fox example, in a qemu/kvm VM, if the guest do dc at the pflash
>> > > range from
>> > > 0 to 128M, error is reported by kvm as "Data abort outside memslots
>> > > with no valid syndrome info" which is not quite correct.
>> > >
>> > > Signed-off-by: Jianyong Wu <jianyong.wu at arm.com>
>> > > ---
>> > >  arch/arm64/kvm/mmu.c | 12 +++++++++---
>> > >  1 file changed, 9 insertions(+), 3 deletions(-)
>> > >
>> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index
>> > > 7d2257cc5438..de66b7e38a5b 100644
>> > > --- a/arch/arm64/kvm/mmu.c
>> > > +++ b/arch/arm64/kvm/mmu.c
>> > > @@ -1022,9 +1022,15 @@ int kvm_handle_guest_abort(struct kvm_vcpu
>> > > *vcpu)
>> > >  		 * So let's assume that the guest is just being
>> > >  		 * cautious, and skip the instruction.
>> > >  		 */
>> > > -		if (kvm_is_error_hva(hva) && kvm_vcpu_dabt_is_cm(vcpu))
>> > {
>> > > -			kvm_incr_pc(vcpu);
>> > > -			ret = 1;
>> > > +		if (kvm_vcpu_dabt_is_cm(vcpu)) {
>> > > +			if (kvm_is_error_hva(hva)) {
>> > > +				kvm_incr_pc(vcpu);
>> > > +				ret = 1;
>> > > +				goto out_unlock;
>> > > +			}
>> > > +
>> > > +			kvm_err("Do cache maintenance in the read-only
>> > memory range\n");
>> >
>> > We don't scream on the console for guests bugs.
>> Ok
>> 
>> >
>> > > +			ret = -EFAULT;
>> > >  			goto out_unlock;
>> > >  		}
>> >
>> > And what is userspace going to do with that? To be honest, I'd rather
>> > not report it in any case:
>> >
>> > - either it isn't mapped, and there is no cache to clean/invalidate
>> > - or it is mapped read-only:
>> >    - if it is a "DC IVAC", the guest should get the fault as per
>> >      the ARM ARM. But I don't think we can identify the particular CMO
>> >      at this stage, so actually performing an invalidation is the least
>> >      bad thing to do.
>> >
>> > How about this (untested)?
> 
> I have tested for this. It works that DC ops can pass on memory range
> for rom. But there is performance issue. It takes too long a time that
> do DC on rom range compared with on  normal memory range. Here is some
> data:
> Ops                  memory type                                size
>                   time
> dc civac         rom memory                                128M
>        6700ms;
> dc civac       writable normal memory             128M                
> 300ms;
> 
> It's a single thread test and may be worse on multi thread. I'm not
> sure we can bear it. WDYT?

The problem is that the guest is invalidating one cache-line at
a time, but we invalidate 128M at a time in your example.

I would say that I really don't care how slow it is. We cannot know
which address the guest is trying to invalidate (as your commit
message shows, there is no syndrome information available).

So it seems our only choices are:
- don't do any invalidation, which is likely to break the guest
- invalidate everything, always

Given that, I'd rather have a slow guest. Also, it very much looks
like no existing SW does this, so I cannot say I care much.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...



More information about the linux-arm-kernel mailing list