RFC Block Layer Extensions to Support NV-DIMMs

Thu Sep 26 02:58:50 EDT 2013

Hi Rob,

Rob Gittins, on 09/23/2013 03:51 PM wrote:
> On Fri, 2013-09-06 at 22:12 -0700, Vladislav Bolkhovitin wrote:
>> Rob Gittins, on 09/04/2013 02:54 PM wrote:
>>> Non-volatile DIMMs have started to become available.  A NVDIMMs is a
>>> DIMM that does not lose data across power interruptions.  Some of the
>>> NVDIMMs act like memory, while others are more like a block device
>>> on the memory bus. Application uses vary from being used to cache
>>> critical data, to being a boot device.
>>>
>>> There are two access classes of NVDIMMs,  block mode and
>>> “load/store” mode DIMMs which are referred to as Direct Memory
>>> Mappable.
>>>
>>> The block mode is where the DIMM provides IO ports for read or write
>>> of data.  These DIMMs reside on the memory bus but do not appear in the
>>> application address space.  Block mode DIMMs do not require any changes
>>> to the current infrastructure, since they provide IO type of interface.
>>>
>>> Direct Memory Mappable DIMMs (DMMD) appear in the system address space
>>> and are accessed via load and store instructions.  These NVDIMMs
>>> are part of the system physical address space (SPA) as memory with
>>> the attribute that data survives a power interruption.  As such this
>>> memory is managed by the kernel which can  assign virtual addresses and
>>> mapped into application’s address space as well as being accessible
>>> by the kernel.  The area mapped into the system address space is
>>> being referred to as persistent memory (PMEM).
>>>
>>> PMEM introduces the need for new operations in the
>>> block_device_operations to support the specific characteristics of
>>> the media.
>>>
>>> First data may not propagate all the way through the memory pipeline
>>> when store instructions are executed.  Data may stay in the CPU cache
>>> or in other buffers in the processor and memory complex.  In order to
>>> ensure the durability of data there needs to be a driver entry point
>>> to force a byte range out to media.  The methods of doing this are
>>> specific to the PMEM technology and need to be handled by the driver
>>> that is supporting the DMMDs.  To provide a way to ensure that data is
>>> durable adding a commit function to the block_device_operations vector.
>>>
>>>    void (*commitpmem)(struct block_device *bdev, void *addr);
>>
>> Why to glue to the block concept for apparently not block class of devices? By pushing
>> NVDIMMs into the block model you both limiting them to block devices capabilities as
>> well as have to expand block devices by alien to them properties
> Hi Vlad,
> 
> We chose to extent the block operations for a couple of reasons.  The
> majority of NVDIMM usage is by emulating block mode.  We figure that
> over time usages will appear that use them directly and then we can
> design interfaces to enable direct use.  
> 
> Since a range of NVDIMM needs a name, security and other attributes mmap
> is a really good model to build on.  This quickly takes us into the
> realm of a file systems, which are easiest to build on the existing
> block infrastructure.  
> 
> Another reason to extend block is that all of the existing
> administrative interfaces and tools such as mkfs still work and we have
> not added some new management tools and requirements that may inhibit
> the adoption of the technology.  Basically if it works today for block
> the same cli commands will work for NVDIMMs.
> 
> The extensions are so minimal that they don't negatively impact the
> existing interfaces.

Well, they will negatively impact them, because those NVDIMM additions are conceptually
alien for the block devices concept.

You didn't answer, why not create a new class of devices for NVDIMM devices, and
implement one-fit-all block driver for them? Simple, clean and elegant solution, which
will fit your need to have block device from NVDIMM device pretty well with minimal effort.

Vlad