[PATCH 10/21] block: Add fops atomic write support

Dave Chinner david at fromorbit.com
Thu Oct 5 21:31:05 PDT 2023


On Thu, Oct 05, 2023 at 03:58:38PM -0700, Bart Van Assche wrote:
> On 10/5/23 15:36, Dave Chinner wrote:
> > $ lspci |grep -i nvme
> > 03:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
> > 06:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
> > $ cat /sys/block/nvme*n1/queue/write_cache
> > write back
> > write back
> > $
> > 
> > That they have volatile writeback caches....
> 
> It seems like what I wrote has been misunderstood completely. With
> "handling a power failure cleanly" I meant that power cycling a block device
> does not result in read errors nor in reading data that has never been written.

Then I don't see what your concern is. 

Single sector writes are guaranteed atomic and have been for as long
as I've worked in this game. OTOH, multi-sector writes are not
guaranteed to be atomic - they can get torn on sector boundaries,
but the individual sectors within that write are guaranteed to be
all-or-nothing. 

Any hardware device that does not guarantee single sector write
atomicity (i.e. tears in the middle of a sector) is, by definition,
broken. And we all know that broken hardware means nothing in the
storage stack works as it should, so I just don't see what point you
are trying to make...

> Although it is hard to find information about this topic, here is what I found
> online:
> * About certain SSDs with power loss protection:
>   https://us.transcend-info.com/embedded/technology/power-loss-protection-plp
> * About another class of SSDs with power loss protection:
>   https://www.kingston.com/en/blog/servers-and-data-centers/ssd-power-loss-protection
> * About yet another class of SSDs with power loss protection:
>   https://phisonblog.com/avoiding-ssd-data-loss-with-phisons-power-loss-protection-2/

Yup, devices that behave as if they have non-volatile write caches.
Such devices have been around for more than 30 years, they operate
the same as devices without caches at all.

> So far I haven't found any information about hard disks and power failure
> handling. What I found is that most current hard disks protect data with ECC.
> The ECC mechanism should provide good protection against reading data that
> has never been written.  If a power failure occurs while a hard disk is writing
> a physical block, can this result in a read error after power is restored? If
> so, is this behavior allowed by storage standards?

If a power fail results in read errors from the storage media being
reported to the OS instead of the data that was present in the
sector before the power failure, then the device is broken. If there
is no data in the region being read because it has never been
written, then it should return zeros (no data) rather than stale
data or a media read error.

-Dave.
-- 
Dave Chinner
david at fromorbit.com



More information about the Linux-nvme mailing list