[PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary
Vitaliy Filippov
vitalifster at gmail.com
Mon Dec 22 05:28:42 PST 2025
Hi linux-fsdevel,
I recently discovered that Linux incorrectly requires all atomic
writes to have 2^N length and to be aligned on the length boundary.
This requirement contradicts NVMe specification which doesn't require
such alignment and length and thus highly restricts usage of atomic
writes with NVMe disks which support it (Micron and Kioxia).
NVMe specification has its own atomic write restrictions - AWUPF and
NABSPF/NABO, but both are already checked by the nvme subsystem.
The 2^N restriction comes from generic_atomic_write_valid().
I submitted a patch which removes this restriction to linux-block and
linux-nvme. Sorry if these maillists weren't the right place to send
it to, it's my first patch :).
But the function is currently used in 3 places: block/fops.c,
fs/ext4/file.c and fs/xfs/xfs_file.c.
Can you tell me if ext4 and xfs really want atomic writes to be 2^N
sized and length-aligned?
>From looking at the code I'd say they don't really require it?
Can you approve my patch if I'm right? Please :-)
On Mon, Dec 22, 2025 at 12:54 PM Vitaliy Filippov <vitalifster at gmail.com> wrote:
>
> Hi! Thanks a lot for your reply! This is actually my first patch ever
> so please don't blame me for not following some standards, I'll try to
> resubmit it correctly.
>
> Regarding the rest:
>
> 1) NVMe atomic boundaries seem to already be checked in
> nvme_valid_atomic_write().
>
> 2) What's atomic_write_hw_unit_max? As I understand, Linux also
> already checks it, at least
> /sys/block/nvme**/queue/atomic_write_max_bytes is already limited by
> max_hw_sectors_kb.
>
> 3) Yes, I've of course seen that this function is also used by ext4
> and xfs, but I don't understand the motivation behind the 2^n
> requirement. I suppose file systems may fragment the write according
> to currently allocated extents for example, but I don't see how issues
> coming from this can be fixed by requiring writes to be 2^n.
>
> But I understand that just removing the check may break something if
> somebody relies on them. What do you think about removing the
> requirement only for NVMe or only for block devices then? I see 3 ways
> to do it:
> a) split generic_atomic_write_valid() into two functions - first for
> all types of inodes and second only for file systems.
> b) remove generic_atomic_write_valid() from block device checks at all.
> c) change generic_atomic_write_valid() just like in my original patch
> but copy original checks into other places where it's used (ext4 and
> xfs).
>
> Which way do you think would be the best?
>
> On Mon, Dec 22, 2025 at 2:17 AM Keith Busch <kbusch at kernel.org> wrote:
> >
> > On Sun, Dec 21, 2025 at 04:24:02PM +0300, Vitaliy Filippov wrote:
> > > It contradicts NVMe specification where alignment is only required when atomic
> > > write boundary (NABSPF/NABO) is set and highly limits usage of NVMe atomic writes
> >
> > Commit header is missing the "fs:" prefix, and the commit log should
> > wrap at 72 characters.
> >
> > On the techincal side, this is a generic function used by multiple
> > protocols, so you can't just appeal to NVMe to justify removing the
> > checks.
> >
> > NVMe still has atomic boundaries where straddling it fails to be an
> > atomic operation. Instead of removing the checks, you'd have to replace
> > it with a more costly operation if you really want to support more
> > arbitrary write lengths and offsets. And if you do manage to remove the
> > power of two requirement, then the queue limit for nvme's
> > atomic_write_hw_unit_max isn't correct anymore.
More information about the Linux-nvme
mailing list