[PATCH v7 0/4] arm: dirty page logging support for ARMv7

Christoffer Dall christoffer.dall at linaro.org
Sun Jun 8 03:45:26 PDT 2014

On Tue, Jun 03, 2014 at 04:19:23PM -0700, Mario Smarduch wrote:
> This patch adds support for dirty page logging so far tested only on ARMv7.
> With dirty page logging, GICv2 vGIC and arch timer save/restore support, live 
> migration is supported. 
> Dirty page logging support -
> - initially write protects VM RAM memory regions - 2nd stage page tables
> - add support to read dirty page log and again write protect the dirty pages 
>   - second stage page table for next pass.
> - second stage huge page are disolved into page tables to keep track of
>   dirty pages at page granularity. Tracking at huge page granularity limits 
>   migration to an almost idle system. There are couple approaches to handling
>   huge pages:
>   1 - break up huge page into page table and write protect all pte's
>   2 - clear the PMD entry, create a page table install the faulted page entry
>       and write protect it.

not sure I fully understand.  Is option 2 simply write-protecting all
PMDs and splitting it at fault time?

>   This patch implements #2, in the future #1 may be implemented depending on
>   more bench mark results.
>   Option 1: may over commit and do unnecessary work, but on heavy loads appears
>             to converge faster during live migration
>   Option 2: Only write protects pages that are accessed, migration
> 	    varies, takes longer then Option 1 but eventually catches up.
> - In the event migration is canceled, normal behavior is resumed huge pages
>   are rebuilt over time.
> - Another alternative is use of reverse mappings where for each level 2nd
>   stage tables (PTE, PMD, PUD) pointers to spte's are maintained (x86 impl.).
>   Primary reverse mapping benefits are for mmu notifiers for large memory range
>   invalidations. Reverse mappings also improve dirty page logging, instead of
>   walking page tables, spete pointers are accessed directly via reverse map
>   array.
> - Reverse mappings will be considered for future support once the current
>   implementation is hardened.

Is the following a list of your future work?

>   o validate current dirty page logging support
>   o VMID TLB Flushing, migrating multiple guests
>   o GIC/arch-timer migration
>   o migration under various loads, primarily page reclaim and validate current
>     mmu-notifiers
>   o Run benchmarks (lmbench for now) and test impact on performance, and
>     optimize
>   o Test virtio - since it writes into guest memory. Wait until pci is supported
>     on ARM.

So you're not testing with virtio now?  Your command line below seems to
suggest that in fact you are.  /me confused.

>   o Currently on ARM, KVM doesn't appear to write into Guest address space,
>     need to mark those pages dirty too (???).

not sure what you mean here, can you expand?

> - Move onto ARMv8 since 2nd stage mmu is shared between both architectures. 
>   But in addition to dirty page log additional support for GIC, arch timers, 
>   and emulated devices is required. Also working on emulated platform masks
>   a lot of potential bugs, but does help to get majority of code working.
> Test Environment:
> ---------------------------------------------------------------------------
> NOTE: RUNNING on FAST Models will hardly ever fail and mask bugs, infact 
>       initially light loads were succeeding without dirty page logging support.
> ---------------------------------------------------------------------------
> - Will put all components on github, including test setup diagram
> - In short summary
>   o Two ARM Exyonys 5440 development platforms - 4-way 1.7 GHz, with 8GB, 256GB
>     storage, 1GBs Ethernet, with swap enabled
>   o NFS Server runing Ubuntu 13.04 
>     - both ARM boards mount shared file system 
>     - Shared file system includes - QEMU, Guest Kernel, DTB, multiple Ext3 root
>       file systems.
>   o Component versions: qemu-1.7.5, vexpress-a15, host/guest kernel 3.15-rc1,
>   o Use QEMU Ctr+A+C and migrate -d tcp:IP:port command
>     - Destination command syntax: can change smp to 4, machine model outdated,
>       but has been tested on virt by others (need to upgrade)
> 	/mnt/migration/qemu-system-arm -enable-kvm -smp 2 -kernel \
> 	/mnt/migration/zImage -dtb /mnt/migration/guest-a15.dtb -m 1792 \
> 	-M vexpress-a15 -cpu cortex-a15 -nographic \
> 	-append "root=/dev/vda rw console=ttyAMA0 rootwait" \
> 	-drive if=none,file=/mnt/migration/guest1.root,id=vm1 \
> 	-device virtio-blk-device,drive=vm1 \
> 	-netdev type=tap,id=net0,ifname=tap0 \
> 	-device virtio-net-device,netdev=net0,mac="52:54:00:12:34:58" \
> 	-incoming tcp:0:4321
>     - Source command syntax same except '-incoming'
>   o Test migration of multiple VMs use tap0, tap1, ..., and guest0.root, .....
>     has been tested as well.
>   o On source run multiple copies of 'dirtyram.arm' - simple program to dirty
>     pages periodically.
>     ./dirtyarm.ram <total mmap size> <dirty page size> <sleep time>
>     Example:
>     ./dirtyram.arm 102580 812 30
>     - dirty 102580 pages
>     - 812 pages every 30ms with an incrementing counter 
>     - run anywhere from one to as many copies as VM resources can support. If 
>       the dirty rate is too high migration will run indefintely
>     - run date output loop, check date is picked up smoothly
>     - place guest/host into page reclaim/swap mode - by whatever means in this
>       case run multiple copies of 'dirtyram.ram' on host
>     - issue migrate command(s) on source
>     - Top result is 409600, 8192, 5
>   o QEMU is instrumented to save RAM memory regions on source and destination
>     after memory is migrated, but before guest started. Later files are 
>     checksummed on both ends for correctness, given VMs are small this works. 
>   o Guest kernel is instrumented to capture current cycle counter - last cycle
>     and compare to qemu down time to test arch timer accuracy. 
>   o Network failover is at L3 due to interface limitations, ping continues
>     working transparently
>   o Also tested 'migrate_cancel' to test reassemble of huge pages (inserted low
>     level instrumentation code).

Thanks for the info, this makes it much clearer to me how you're testing
this and I will try to reprocuce.


More information about the linux-arm-kernel mailing list