[PATCH 32/37] KVM: arm/arm64: Handle VGICv2 save/restore from the main VGIC code
Yury Norov
ynorov at caviumnetworks.com
Thu Nov 30 04:09:41 PST 2017
On Sun, Nov 26, 2017 at 08:46:41PM +0100, Christoffer Dall wrote:
> On Sun, Nov 26, 2017 at 01:29:30PM +0300, Yury Norov wrote:
> > On Wed, Nov 15, 2017 at 05:50:07PM +0000, Andre Przywara wrote:
> > > Hi,
> > >
> > > those last few patches are actually helpful for the Xen port ...
> >
> > [...]
> >
> > > > +static void save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base)
> > > > +{
> > > > + struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
> > > > + int nr_lr = kvm_vgic_global_state.nr_lr;
> > > > + u32 elrsr0, elrsr1;
> > > > +
> > > > + elrsr0 = readl_relaxed(base + GICH_ELRSR0);
> > > > + if (unlikely(nr_lr > 32))
> > > > + elrsr1 = readl_relaxed(base + GICH_ELRSR1);
> > > > + else
> > > > + elrsr1 = 0;
> > > > +
> > > > +#ifdef CONFIG_CPU_BIG_ENDIAN
> > > > + cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1;
> > > > +#else
> > > > + cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0;
> > > > +#endif
> > >
> > > I have some gut feeling that this is really broken, since we mix up
> > > endian *byte* ordering with *bit* ordering here, don't we?
> >
> > Good feeling indeed. :)
> >
> > We have bitmap_{from,to)_u32array for things like this. But it was
> > considered bad-designed, and I proposed new bitmap_{from,to)_arr32().
> >
> > https://lkml.org/lkml/2017/11/15/592
> >
> > What else I have in mind, to introduce something like bitmap_{from,to}_pair_32()
> > as most of current users of bitmap_{from,to)_u32array(), (and those who should
> > use it but don't, like this one) have only 2 32-bit halfwords to be copied
> > from/to bitmap.
> >
> > Also, it will be complementary to bitmap_from_u64().
> >
> > More reading about bitmap/array conversion is in comment to BITMAP_FROM_U64
> > macro.
> >
>
> I have no idea what you want to introduce here. If you have an idea on
> how to improve the code, patches are welcome.
That's about Andre's gut feeling, not about your patch. I have some
ideas related to it, and just want to share it to him - that's all.
> Please keep in mind, that the purpose of this patch is to move code
> around to improve the GIC handling performance, not changing the
> lower-level details of the code.
>
> > > I understand it's just copied and gets removed later on, so I was
> > > wondering if you could actually move patch 35/37 ("Get rid of
> > > vgic_elrsr") before this patch here, to avoid copying bogus code around?
> > > Or does 35/37 depend on 34/37 to be correct?
> > >
> > > > +}
> > > > +
> > > > +static void save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
> > > > +{
> > > > + struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
> > > > + int i;
> > > > + u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
> > > > +
> > > > + for (i = 0; i < used_lrs; i++) {
> > > > + if (cpu_if->vgic_elrsr & (1UL << i))
> >
> > So, the vgic_elrsr is naturally bitmap, and bitmap API is preferred if no
> > other considerations:
> > if (test_bit(i, cpu_if->vgic_elrsr))
> >
> > > > + cpu_if->vgic_lr[i] &= ~GICH_LR_STATE;
> > > > + else
> > > > + cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
> > > > +
> > > > + writel_relaxed(0, base + GICH_LR0 + (i * 4));
> > > > + }
> > > > +}
> >
> > I'd also headscratch about using for_each_clear_bit() here:
> >
> > /*
> > * Setup default vgic_lr values somewhere earlier.
>
> Not sure what the 'default' values are.
>
> > * Not needed at all if you take my suggestion for
> > * vgic_v2_restore_state() below
> > */
> > for (i = 0; i < used_lrs; i++)
> > cpu_if->vgic_lr[i] &= ~GICH_LR_STATE;
> >
> > static void save_lrs(struct kvm_vcpu *vcpu, void __iomem *base)
> > {
> > [...]
> >
> > for_each_clear_bit (i, cpu_if->vgic_elrsr, used_lrs)
> > cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4));
> >
> > for (i = 0; i < used_lrs; i++)
> > writel_relaxed(0, base + GICH_LR0 + (i * 4));
> > }
> >
> > Not sure how performance-critical this path is, but sometimes things
> > get really faster with bitmaps.
> >
>
> Your suggestion below would require us to maintain elrsr when we setup
> list registers, and I don't really see the benefit.
That's what I asked - is it maintained or not. If not then it will not
work.
> > [...]
> >
> > > > +void vgic_v2_restore_state(struct kvm_vcpu *vcpu)
> > > > +{
> > > > + struct kvm *kvm = vcpu->kvm;
> > > > + struct vgic_dist *vgic = &kvm->arch.vgic;
> > > > + struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2;
> > > > + void __iomem *base = vgic->vctrl_base;
> > > > + u64 used_lrs = vcpu->arch.vgic_cpu.used_lrs;
> > > > + int i;
> > > > +
> > > > + if (!base)
> > > > + return;
> > > > +
> > > > + if (used_lrs) {
> > > > + writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR);
> > > > + writel_relaxed(cpu_if->vgic_apr, base + GICH_APR);
> > > > + for (i = 0; i < used_lrs; i++) {
> > > > + writel_relaxed(cpu_if->vgic_lr[i],
> > > > + base + GICH_LR0 + (i * 4));
> > > > + }
> > > > + }
> > > > +}
> >
> > The alternative approach would be:
> > for (i = 0; i < used_lrs; i++) {
> > if (test_bit(i, cpu_if->vgic_elrsr))
> > writel_relaxed(~GICH_LR_STATE, base + GICH_LR0 + (i * 4));
> > else
> > writel_relaxed(cpu_if->vgic_lr[i], base + GICH_LR0 + (i * 4));
> > }
> >
> > If cpu_if->vgic_elrsr is untouched in-between of course. It will make
> > save_lrs() simpler and this function more verbose.
> >
> I don't understand your suggestion. As you will see later, we will get
> rid of storing the elrsr completely with a measureable performance
> improvement.
OK, now I see. Sorry for stupid questions - I just start learning
codebase. By the way, can you share the technique that you use to
measure performance? It would be great if I can reproduce your
results.
> If you think you can improve the code beyond that, a follow-up patch
> would be most welcome.
>
> Note that on all the implementations I'm familiar with, the maximum
> number of LRs is four, so we're not wading through massive bitmaps in
> practice here.
>
> Thanks,
> -Christoffer
More information about the linux-arm-kernel
mailing list