[Patch v9 03/10] fs: Initial atomic write support
Hannes Reinecke
hare at suse.de
Thu Jun 20 22:56:06 PDT 2024
On 6/20/24 14:53, John Garry wrote:
> From: Prasad Singamsetty <prasad.singamsetty at oracle.com>
>
> An atomic write is a write issued with torn-write protection, meaning
> that for a power failure or any other hardware failure, all or none of the
> data from the write will be stored, but never a mix of old and new data.
>
> Userspace may add flag RWF_ATOMIC to pwritev2() to indicate that the
> write is to be issued with torn-write prevention, according to special
> alignment and length rules.
>
> For any syscall interface utilizing struct iocb, add IOCB_ATOMIC for
> iocb->ki_flags field to indicate the same.
>
> A call to statx will give the relevant atomic write info for a file:
> - atomic_write_unit_min
> - atomic_write_unit_max
> - atomic_write_segments_max
>
> Both min and max values must be a power-of-2.
>
> Applications can avail of atomic write feature by ensuring that the total
> length of a write is a power-of-2 in size and also sized between
> atomic_write_unit_min and atomic_write_unit_max, inclusive. Applications
> must ensure that the write is at a naturally-aligned offset in the file
> wrt the total write length. The value in atomic_write_segments_max
> indicates the upper limit for IOV_ITER iovcnt.
>
> Add file mode flag FMODE_CAN_ATOMIC_WRITE, so files which do not have the
> flag set will have RWF_ATOMIC rejected and not just ignored.
>
> Add a type argument to kiocb_set_rw_flags() to allows reads which have
> RWF_ATOMIC set to be rejected.
>
> Helper function generic_atomic_write_valid() can be used by FSes to verify
> compliant writes. There we check for iov_iter type is for ubuf, which
> implies iovcnt==1 for pwritev2(), which is an initial restriction for
> atomic_write_segments_max. Initially the only user will be bdev file
> operations write handler. We will rely on the block BIO submission path to
> ensure write sizes are compliant for the bdev, so we don't need to check
> atomic writes sizes yet.
>
> Signed-off-by: Prasad Singamsetty <prasad.singamsetty at oracle.com>
> jpg: merge into single patch and much rewrite
> Acked-by: "Darrick J. Wong" <djwong at kernel.org>
> Reviewed-by: Martin K. Petersen <martin.petersen at oracle.com>
> Signed-off-by: John Garry <john.g.garry at oracle.com>
> ---
> fs/aio.c | 8 ++++----
> fs/btrfs/ioctl.c | 2 +-
> fs/read_write.c | 18 +++++++++++++++++-
> include/linux/fs.h | 17 +++++++++++++++--
> include/uapi/linux/fs.h | 5 ++++-
> io_uring/rw.c | 9 ++++-----
> 6 files changed, 45 insertions(+), 14 deletions(-)
>
Reviewed-by: Hannes Reinecke <hare at suse.de>
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare at suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
More information about the Linux-nvme
mailing list