[PATCH v5 2/4] live migration support for initial write protect of VM
Christoffer Dall
christoffer.dall at linaro.org
Mon May 19 10:56:48 PDT 2014
On Fri, May 16, 2014 at 02:39:16PM -0700, Mario Smarduch wrote:
> Hi Christoffer,
> few more comments
> >>> struct vgic_dist vgic;
> >>> + /* Marks start of migration, used to handle 2nd stage page faults
> >>> + * during migration, prevent installing huge pages and split huge pages
> >>> + * to small pages.
> >>> + */
> >>
> >> commenting style
> >>
> >> this is a bit verbose for a field in a struct, perhaps moving the longer
> >> version to where you set this?
> > Will do.
> >>
> >>> + int migration_in_progress;
> >>> };
>
> I think this flag could be removed all together. Migration can be
> stopped at any time (started too), through user request or other events.
> When that happens (like migrate_cancel) migrate cleanup bh runs and eventually calls
> KVM memory listener kvm_log_global_start() (cancel handler)
> that stops logging, clears KVM_MEM_LOG_DIRTY_PAGES, and region ops ioctl,
> clears dirty_bitmap. In either case dirty_bitmap for memslot is set or
> unset during migration to track dirty pages, following that field seems to be
> a better way to keep track of migration. This again is QEMU view but it appears
> all these policies are driven from user space.
>
ok, I need to look more closely at the whole thing to properly comment
on this.
>
>
> >>>
> >>> +/* kvm_split_pmd - splits huge pages to small pages, required to keep a dirty
> >>> + * log of smaller memory granules, otherwise huge pages would need to be
> >>> + * migrated. Practically an idle system has problems migrating with
> >>
> >> This seems abrupt. Why can't we just represent a 2M huge page as 512 4K
> >> bits and write protect the huge pages, if you take a write fault on a 2M
> >> page, then split it then.
> >
> > That's one alternative the one I put into v6 is clear the PMD
> > and force user_mem_abort() to fault in 4k pages, and mark the
> > dirty_bitmap[] for that page, reuse the current code. Have not
> > checked the impact on performance, it takes few seconds longer
> > to converge for the tests I'm running.
>
> I was thinking about this and if PMD attributes need to be passed
> onto the PTEs then it appears what you recommend is required.
> But during run time I don't see how 2nd stage attributes can
> change, could the guest do anything to change them (SH, Memattr)?
You should be able to just grab the kvm_mmu lock, update the stage-2
page tables to remove all writable bits, flush all Stage-2 TLBs for that
VMID, and you should be all set.
>
>
> Performance may also be other reason but that always depends
> on the load, clearing a PMD seems easier and reuses current code.
> Probably several load tests/benchmarks can help here.
> Also noticed hw PMD/PTE attributes differ a little which
> is not significant now, but moving forward different page size
> and any new revisions to fields may require additional maintenance.
I think clearing out all PMD mappings will carry a significant
performance degradation on the source VM, and in the case you keep it
running, will be quite unfortunate. Hint: Page faults are expensive and
huge pages have shown to give about 10-15% performance increase on ARMv7
for CPU/memory intensive benchmarks.
>
> I'll be out next week and back 26'th, I'll create a link with
> details on test environment and tests. The cover letter will
> will go through general overview only.
>
ok, I have some time then.
-Christoffer
>
> >
> >>
> >> If your use case is HA, then you will be doing this a lot, and you don't
> >> want to hurt performance of your main live system more than necessary.
> >
> >>
> >>> + * huge pages. Called during WP of entire VM address space, done
> >>
>
>
More information about the linux-arm-kernel
mailing list