dax alignment problem on arm64 (and other achitectures)

Pavel Tatashin pasha.tatashin at soleen.com
Fri Jan 29 11:24:21 EST 2021


On Fri, Jan 29, 2021 at 8:19 AM David Hildenbrand <david at redhat.com> wrote:
>
> On 29.01.21 03:06, Pavel Tatashin wrote:
> >>> Might be related to the broken custom pfn_valid() implementation for
> >>> ZONE_DEVICE.
> >>>
> >>> https://lkml.kernel.org/r/1608621144-4001-1-git-send-email-anshuman.khandual@arm.com
> >>>
> >>> And essentially ignoring sub-section data in there for now as well (but
> >>> might not be that relevant yet). In addition, this might also be related to
> >>>
> >>> https://lkml.kernel.org/r/161058499000.1840162.702316708443239771.stgit@dwillia2-desk3.amr.corp.intel.com
> >>
> >> I will check it, and see what I find. I saw that panic almost a year
> >> ago, things might have changed since then.
> >
> > Hi David,
> >
> > There is no panic anymore, but I also can't offset by 2M anymore, the
> > minimum that works now is 16M, and if alignment is less than 16M
> > creating devdax device fails.
>
> I wonder why we get such different namespace sizes? Where do the
> differences come from? This looks very weird.
>
> >
> > So, I tried the new ARM64 patch that reduces section sizes, and two
> > alignments for pmem: regular 2G alignment, and 2G+16M alignment.
> > (subtracted 16M from the bottom)
> >
> > ***** 4K page, 6G RAM, 2G PRAM  *****
> > BOOT:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-23fffffff : namespace0.0
> > DEVDAX:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-1c21fffff : namespace0.0
> > 1c2200000-23fffffff : dax0.0
> > HOTPLUG:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-1c21fffff : namespace0.0
> > 1c8000000-23fffffff : dax0.0
> >    1c8000000-23fffffff : System RAM (kmem)               128M Wasted (Expected)
>
> The namespace spans 34MB??
>
> >
> > ***** 4K page, 6G-16M RAM, 2G+16M PRAM  *****
> > BOOT:
> > 40000000-1beffffff : System RAM
> > 1bf000000-23fffffff : namespace0.0
> > DEVDAX:
> > 40000000-1beffffff : System RAM
> > 1bf000000-1c11fffff : namespace0.0
> > 1c1200000-23fffffff : dax0.0
> > HOTPLUG:
> > 40000000-1beffffff : System RAM
> > 1bf000000-1c11fffff : namespace0.0
> > 1c8000000-23fffffff : dax0.0
> >    1c8000000-23fffffff : System RAM (kmem)               144M Wasted (????)
>
> The namespace spans 34MB??

Right, this seems like a bug

>
> >
> > ***** 64K page, 6G RAM, 2G PRAM  *****
> > BOOT:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-23fffffff : namespace0.0
> > DEVDAX:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-1dfffffff : namespace0.0
> > 1e0000000-23fffffff : dax0.0
> > HOTPLUG:
> > 40000000-1bfffffff : System RAM
> > 1c0000000-1dfffffff : namespace0.0
>
> The namespace spans 512MB ?!? What?

This is because section size is 512M with 64K pages.

>
> > 1e0000000-23fffffff : dax0.0
> >    1e0000000-23fffffff : System RAM (kmem)               512M Wasted (Expected)
> >
> > ***** 64K page, 6G-16M RAM, 2G+16M PRAM  *****
> > BOOT:
> > 40000000-1beffffff : System RAM
> > 1bf000000-23fffffff : namespace0.0
> > DEVDAX:
> > 40000000-1beffffff : System RAM
> > 1bf000000-1bf3fffff : namespace0.0
> > 1bf400000-23fffffff : dax0.0
> > HOTPLUG:
> > 40000000-1beffffff : System RAM
> > 1bf000000-1bf3fffff : namespace0.0
>
> The namespace now consumes 4MB ?!?
>
> > 1c0000000-23fffffff : dax0.0
> >    1c0000000-23fffffff : System RAM (kmem)               16M Wasted (Optimal)
>
> Good :) I guess more optimal would be 2MB/0MB :)

Agree, but for the offset 16M this is optimal, because 16M is smaller
than section size.

>
> >
> > In all three cases only System RAM, namespace0.0, and dax0.0 were
> > printed from /proc/iomem.
> > BOOT    content of iomem right after boot
> > DEVDAX  content of iomem after devdax is created
> >             ndctl create-namespace --mode devdax  -e namespace0.0"
> > HOTPLUG content of imem after dax0.0 is hotplugged:
> >             echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
> >             echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
> >
> >
> > The most surprising part is why with 4K pages and 16M offset 144M is
> > wasted? For whatever reason, when devdax is created 34 goes wasted to
> > the label? Something is wrong here.. However, I am happy with 64K
> > pages result, and that only 16M is wasted, of course optimally, we
> > should be using any memory here, but it is still much better than what
> > we have now.
>
> Definitely, but we should try figuring out what's going on here. I
> assume on x86-64 it behaves differently?

Yes, we should root cause. I highly suspect that there is somewhere
alignment miscalculations happen that cause this memory waste with the
offset 16M. I am also not sure why the 2M label size was increased,
and  why 16M is now an alignment requirement.

I tested on x86, and got pretty much the same results as on ARM64: 2M
offset is not allowed anymore 16M minimum, and even with 16M offset,
144M is wasted. Here is full QEMU command if anyone wants to repro it:


KERNEL_PARAM='console=ttyS0 ip=dhcp'
KERNEL_PARAM+=' memmap=2G!8G'
#KERNEL_PARAM+=' memmap=2064M!8176M'

qemu-system-x86_64
                                 \
        -m 8G -smp 1
                                 \
        -machine q35
                                 \
        -nographic
                                 \
        -enable-kvm
                                 \
        -kernel pmem/native/arch/x86/boot/bzImage
                                 \
        -initrd
../poky/build/tmp/deploy/images/qemux86-64/core-image-minimal-qemux86-64.cpio.gz
       \
        -chardev stdio,id=console,signal=off,mux=on
                                 \
        -mon chardev=console
                                 \
        -serial chardev:console
                                 \
        -netdev user,hostfwd=tcp::5000-:22,id=netdev0
                                 \
        -device virtio-net-pci,netdev=netdev0
                                 \
        -append "$KERNEL_PARAM"

Also, I am using current master branch tip for ndctl command:
root at qemux86-64:~# ndctl --version
71.2.gea014c0

***** 4K page, 6G RAM, 2G PRAM:  kernel parameter memmap=2G!8G *****
BOOT:
100000000-1ffffffff : System RAM
200000000-27fffffff : Persistent Memory (legacy)
  200000000-27fffffff : namespace0.0

DEVDAX:
100000000-1ffffffff : System RAM
200000000-27fffffff : Persistent Memory (legacy)
  200000000-2021fffff : namespace0.0
  202200000-27fffffff : dax0.0

HOTPLUG:
100000000-1ffffffff : System RAM
200000000-27fffffff : Persistent Memory (legacy)
  200000000-2021fffff : namespace0.0
  208000000-27fffffff : dax0.0
    208000000-27fffffff : System RAM (kmem)        (128M Wasted)

***** 4K page, 6G-16M RAM, 2G+16M PRAM: kernel parameter
memmap=2064M!8176M *****
BOOT:
100000000-1feffffff : System RAM
1ff000000-27fffffff : Persistent Memory (legacy)
  1ff000000-27fffffff : namespace0.0

DEVDAX:
100000000-1feffffff : System RAM
1ff000000-27fffffff : Persistent Memory (legacy)
  1ff000000-2011fffff : namespace0.0
  201200000-27fffffff : dax0.0

HOTPLUG:
100000000-1feffffff : System RAM
1ff000000-27fffffff : Persistent Memory (legacy)
  1ff000000-2011fffff : namespace0.0
  208000000-27fffffff : dax0.0
    208000000-27fffffff : System RAM (kmem)  (144M Wasted)

The least amount of wasted memory I can get on x86 with this
experiment is with offset that is larger than 34M, and 16M aligned:
48M: memmap=2096M!8144M

root at qemux86-64:~# cat /proc/iomem | grep 'dax\|namespace\|System\|Pers'
100000000-1fcffffff : System RAM
1fd000000-27fffffff : Persistent Memory (legacy)
  1fd000000-1ff1fffff : namespace0.0
  200000000-27fffffff : dax0.0
    200000000-27fffffff : System RAM (kmem) (48M Wasted)

Pasha


>
> Thanks
>
>
> --
> Thanks,
>
> David / dhildenb
>



More information about the linux-arm-kernel mailing list