[Patch v9 03/10] fs: Initial atomic write support

Hannes Reinecke hare at suse.de
Thu Jun 20 22:56:06 PDT 2024


On 6/20/24 14:53, John Garry wrote:
> From: Prasad Singamsetty <prasad.singamsetty at oracle.com>
> 
> An atomic write is a write issued with torn-write protection, meaning
> that for a power failure or any other hardware failure, all or none of the
> data from the write will be stored, but never a mix of old and new data.
> 
> Userspace may add flag RWF_ATOMIC to pwritev2() to indicate that the
> write is to be issued with torn-write prevention, according to special
> alignment and length rules.
> 
> For any syscall interface utilizing struct iocb, add IOCB_ATOMIC for
> iocb->ki_flags field to indicate the same.
> 
> A call to statx will give the relevant atomic write info for a file:
> - atomic_write_unit_min
> - atomic_write_unit_max
> - atomic_write_segments_max
> 
> Both min and max values must be a power-of-2.
> 
> Applications can avail of atomic write feature by ensuring that the total
> length of a write is a power-of-2 in size and also sized between
> atomic_write_unit_min and atomic_write_unit_max, inclusive. Applications
> must ensure that the write is at a naturally-aligned offset in the file
> wrt the total write length. The value in atomic_write_segments_max
> indicates the upper limit for IOV_ITER iovcnt.
> 
> Add file mode flag FMODE_CAN_ATOMIC_WRITE, so files which do not have the
> flag set will have RWF_ATOMIC rejected and not just ignored.
> 
> Add a type argument to kiocb_set_rw_flags() to allows reads which have
> RWF_ATOMIC set to be rejected.
> 
> Helper function generic_atomic_write_valid() can be used by FSes to verify
> compliant writes. There we check for iov_iter type is for ubuf, which
> implies iovcnt==1 for pwritev2(), which is an initial restriction for
> atomic_write_segments_max. Initially the only user will be bdev file
> operations write handler. We will rely on the block BIO submission path to
> ensure write sizes are compliant for the bdev, so we don't need to check
> atomic writes sizes yet.
> 
> Signed-off-by: Prasad Singamsetty <prasad.singamsetty at oracle.com>
> jpg: merge into single patch and much rewrite
> Acked-by: "Darrick J. Wong" <djwong at kernel.org>
> Reviewed-by: Martin K. Petersen <martin.petersen at oracle.com>
> Signed-off-by: John Garry <john.g.garry at oracle.com>
> ---
>   fs/aio.c                |  8 ++++----
>   fs/btrfs/ioctl.c        |  2 +-
>   fs/read_write.c         | 18 +++++++++++++++++-
>   include/linux/fs.h      | 17 +++++++++++++++--
>   include/uapi/linux/fs.h |  5 ++++-
>   io_uring/rw.c           |  9 ++++-----
>   6 files changed, 45 insertions(+), 14 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare at suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich




More information about the Linux-nvme mailing list