[PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary
Vitaliy Filippov
vitalifster at gmail.com
Tue Dec 23 03:19:02 PST 2025
What does "just the kernel atomic write rules" mean?
What's the idea of these restrictions?
I want to use atomic writes, but without this restriction.
And generally I don't think this restriction is needed for anyone at all.
That's why I ask - can it be removed? Can I remove it in my patch?
On Tue, Dec 23, 2025 at 12:26 PM John Garry <john.g.garry at oracle.com> wrote:
>
> On 22/12/2025 13:28, Vitaliy Filippov wrote:
> > Hi linux-fsdevel,
> > I recently discovered that Linux incorrectly requires all atomic
> > writes to have 2^N length and to be aligned on the length boundary.
> > This requirement contradicts NVMe specification which doesn't require
> > such alignment and length and thus highly restricts usage of atomic
> > writes with NVMe disks which support it (Micron and Kioxia).
>
> All these alignment and size rules are specific to using RWF_ATOMIC. You
> don't have to use RWF_ATOMIC if you don't want to - as you prob know,
> atomic writes are implicit on NVMe.
>
> > NVMe specification has its own atomic write restrictions - AWUPF and
> > NABSPF/NABO, but both are already checked by the nvme subsystem.
> > The 2^N restriction comes from generic_atomic_write_valid().
> > I submitted a patch which removes this restriction to linux-block and
> > linux-nvme. Sorry if these maillists weren't the right place to send
> > it to, it's my first patch :).
> > But the function is currently used in 3 places: block/fops.c,
> > fs/ext4/file.c and fs/xfs/xfs_file.c.
> > Can you tell me if ext4 and xfs really want atomic writes to be 2^N
> > sized and length-aligned?
>
> As above, this is just the kernel atomic write rules to support using
> different storage technologies.
>
> > From looking at the code I'd say they don't really require it?
> > Can you approve my patch if I'm right? Please :-)
> >
> > On Mon, Dec 22, 2025 at 12:54 PM Vitaliy Filippov <vitalifster at gmail.com> wrote:
> >>
> >> Hi! Thanks a lot for your reply! This is actually my first patch ever
> >> so please don't blame me for not following some standards, I'll try to
> >> resubmit it correctly.
> >>
> >> Regarding the rest:
> >>
> >> 1) NVMe atomic boundaries seem to already be checked in
> >> nvme_valid_atomic_write().
> >>
> >> 2) What's atomic_write_hw_unit_max? As I understand, Linux also
> >> already checks it, at least
> >> /sys/block/nvme**/queue/atomic_write_max_bytes is already limited by
> >> max_hw_sectors_kb.
> >>
> >> 3) Yes, I've of course seen that this function is also used by ext4
> >> and xfs, but I don't understand the motivation behind the 2^n
> >> requirement. I suppose file systems may fragment the write according
> >> to currently allocated extents for example, but I don't see how issues
> >> coming from this can be fixed by requiring writes to be 2^n.
> >>
> >> But I understand that just removing the check may break something if
> >> somebody relies on them. What do you think about removing the
> >> requirement only for NVMe or only for block devices then? I see 3 ways
> >> to do it:
> >> a) split generic_atomic_write_valid() into two functions - first for
> >> all types of inodes and second only for file systems.
> >> b) remove generic_atomic_write_valid() from block device checks at all.
> >> c) change generic_atomic_write_valid() just like in my original patch
> >> but copy original checks into other places where it's used (ext4 and
> >> xfs).
> >>
> >> Which way do you think would be the best?
> >>
> >> On Mon, Dec 22, 2025 at 2:17 AM Keith Busch <kbusch at kernel.org> wrote:
> >>>
> >>> On Sun, Dec 21, 2025 at 04:24:02PM +0300, Vitaliy Filippov wrote:
> >>>> It contradicts NVMe specification where alignment is only required when atomic
> >>>> write boundary (NABSPF/NABO) is set and highly limits usage of NVMe atomic writes
> >>>
> >>> Commit header is missing the "fs:" prefix, and the commit log should
> >>> wrap at 72 characters.
> >>>
> >>> On the techincal side, this is a generic function used by multiple
> >>> protocols, so you can't just appeal to NVMe to justify removing the
> >>> checks.
> >>>
> >>> NVMe still has atomic boundaries where straddling it fails to be an
> >>> atomic operation. Instead of removing the checks, you'd have to replace
> >>> it with a more costly operation if you really want to support more
> >>> arbitrary write lengths and offsets. And if you do manage to remove the
> >>> power of two requirement, then the queue limit for nvme's
> >>> atomic_write_hw_unit_max isn't correct anymore.
> >
>
More information about the Linux-nvme
mailing list