[PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary
Ojaswin Mujoo
ojaswin at linux.ibm.com
Tue Jan 27 22:08:19 PST 2026
On Sun, Dec 21, 2025 at 04:24:02PM +0300, Vitaliy Filippov wrote:
> It contradicts NVMe specification where alignment is only required when atomic
> write boundary (NABSPF/NABO) is set and highly limits usage of NVMe atomic writes
>
> Signed-off-by: Vitaliy Filippov <vitalifster at gmail.com>
Hi Vitaliy,
There's some context to how this feature is designed as such. One of the
reasons to have powers of 2 is to abstract out device (SCSI, NVME) level
spec details from the higher level implementation of atomic writes. My
memory on what the specs say is a bit fuzzy but iirc SCSI defines an
optional alignment for WRITE_ATOMIC command wheras NVMe can have a
boundary which shall not be crossed.
Which means, for a user to perform atomic writes, the physical blocks
allocated by the filesystem would need to adhere to these limitations,
which would need knowledge, at the FS level, of what the underlying device
is and what its limitations are. We wanted to avoid exposing these
details to the FS. The power of 2 length and alignment becomes a good
middle ground where if the FS can ensure that the allocated blocks
follow these limits, then it would satisfy both SCSI and NVMe, without
having to worry about the individual spec's details.
It also helps that power of 2 simplifies the calculations at a lot of
places and the first users of the feature ie DBs are okay with this
limitation.
Yes it might be a bit restrictive and we might have use cases in the
future that need non power-of-2, but just removing it from the
generic helpers, like you did, is not the right way. It will be a more
involved change that might need modifications throughout the stack.
Regards,
ojaswin
> ---
> fs/read_write.c | 8 --------
> 1 file changed, 8 deletions(-)
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 833bae068770..5467d710108d 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1802,17 +1802,9 @@ int generic_file_rw_checks(struct file *file_in, struct file *file_out)
>
> int generic_atomic_write_valid(struct kiocb *iocb, struct iov_iter *iter)
> {
> - size_t len = iov_iter_count(iter);
> -
> if (!iter_is_ubuf(iter))
> return -EINVAL;
>
> - if (!is_power_of_2(len))
> - return -EINVAL;
> -
> - if (!IS_ALIGNED(iocb->ki_pos, len))
> - return -EINVAL;
> -
> if (!(iocb->ki_flags & IOCB_DIRECT))
> return -EOPNOTSUPP;
>
> --
> 2.51.0
>
More information about the Linux-nvme
mailing list