[PATCH v2] Do not require atomic writes to be power of 2 sized and aligned on length boundary
Vitaliy Filippov
vitalifster at gmail.com
Tue Dec 23 03:34:14 PST 2025
For example, in theory, there are also SAS disks which require a
separate WRITE ATOMIC command for writes to be atomic.
I'm not sure which actual disk models support it, though... :)
But as I understand, Linux won't be able to send this command without
the RWF_ATOMIC flag.
And RWF_ATOMIC is limited to 2^N and length-aligned writes so it would
block SAS/SCSI atomic write usage for at least part of use-cases.
On Tue, Dec 23, 2025 at 2:19 PM Vitaliy Filippov <vitalifster at gmail.com> wrote:
>
> What does "just the kernel atomic write rules" mean?
> What's the idea of these restrictions?
> I want to use atomic writes, but without this restriction.
> And generally I don't think this restriction is needed for anyone at all.
> That's why I ask - can it be removed? Can I remove it in my patch?
>
> On Tue, Dec 23, 2025 at 12:26 PM John Garry <john.g.garry at oracle.com> wrote:
> >
> > On 22/12/2025 13:28, Vitaliy Filippov wrote:
> > > Hi linux-fsdevel,
> > > I recently discovered that Linux incorrectly requires all atomic
> > > writes to have 2^N length and to be aligned on the length boundary.
> > > This requirement contradicts NVMe specification which doesn't require
> > > such alignment and length and thus highly restricts usage of atomic
> > > writes with NVMe disks which support it (Micron and Kioxia).
> >
> > All these alignment and size rules are specific to using RWF_ATOMIC. You
> > don't have to use RWF_ATOMIC if you don't want to - as you prob know,
> > atomic writes are implicit on NVMe.
> >
> > > NVMe specification has its own atomic write restrictions - AWUPF and
> > > NABSPF/NABO, but both are already checked by the nvme subsystem.
> > > The 2^N restriction comes from generic_atomic_write_valid().
> > > I submitted a patch which removes this restriction to linux-block and
> > > linux-nvme. Sorry if these maillists weren't the right place to send
> > > it to, it's my first patch :).
> > > But the function is currently used in 3 places: block/fops.c,
> > > fs/ext4/file.c and fs/xfs/xfs_file.c.
> > > Can you tell me if ext4 and xfs really want atomic writes to be 2^N
> > > sized and length-aligned?
> >
> > As above, this is just the kernel atomic write rules to support using
> > different storage technologies.
> >
> > > From looking at the code I'd say they don't really require it?
> > > Can you approve my patch if I'm right? Please :-)
> > >
> > > On Mon, Dec 22, 2025 at 12:54 PM Vitaliy Filippov <vitalifster at gmail.com> wrote:
> > >>
> > >> Hi! Thanks a lot for your reply! This is actually my first patch ever
> > >> so please don't blame me for not following some standards, I'll try to
> > >> resubmit it correctly.
> > >>
> > >> Regarding the rest:
> > >>
> > >> 1) NVMe atomic boundaries seem to already be checked in
> > >> nvme_valid_atomic_write().
> > >>
> > >> 2) What's atomic_write_hw_unit_max? As I understand, Linux also
> > >> already checks it, at least
> > >> /sys/block/nvme**/queue/atomic_write_max_bytes is already limited by
> > >> max_hw_sectors_kb.
> > >>
> > >> 3) Yes, I've of course seen that this function is also used by ext4
> > >> and xfs, but I don't understand the motivation behind the 2^n
> > >> requirement. I suppose file systems may fragment the write according
> > >> to currently allocated extents for example, but I don't see how issues
> > >> coming from this can be fixed by requiring writes to be 2^n.
> > >>
> > >> But I understand that just removing the check may break something if
> > >> somebody relies on them. What do you think about removing the
> > >> requirement only for NVMe or only for block devices then? I see 3 ways
> > >> to do it:
> > >> a) split generic_atomic_write_valid() into two functions - first for
> > >> all types of inodes and second only for file systems.
> > >> b) remove generic_atomic_write_valid() from block device checks at all.
> > >> c) change generic_atomic_write_valid() just like in my original patch
> > >> but copy original checks into other places where it's used (ext4 and
> > >> xfs).
> > >>
> > >> Which way do you think would be the best?
> > >>
> > >> On Mon, Dec 22, 2025 at 2:17 AM Keith Busch <kbusch at kernel.org> wrote:
> > >>>
> > >>> On Sun, Dec 21, 2025 at 04:24:02PM +0300, Vitaliy Filippov wrote:
> > >>>> It contradicts NVMe specification where alignment is only required when atomic
> > >>>> write boundary (NABSPF/NABO) is set and highly limits usage of NVMe atomic writes
> > >>>
> > >>> Commit header is missing the "fs:" prefix, and the commit log should
> > >>> wrap at 72 characters.
> > >>>
> > >>> On the techincal side, this is a generic function used by multiple
> > >>> protocols, so you can't just appeal to NVMe to justify removing the
> > >>> checks.
> > >>>
> > >>> NVMe still has atomic boundaries where straddling it fails to be an
> > >>> atomic operation. Instead of removing the checks, you'd have to replace
> > >>> it with a more costly operation if you really want to support more
> > >>> arbitrary write lengths and offsets. And if you do manage to remove the
> > >>> power of two requirement, then the queue limit for nvme's
> > >>> atomic_write_hw_unit_max isn't correct anymore.
> > >
> >
More information about the Linux-nvme
mailing list