[PATCH] fs: remove power of 2 and length boundary atomic write restrictions
John Garry
john.g.garry at oracle.com
Wed Jan 7 02:51:13 PST 2026
On 06/01/2026 13:08, Vitaliy Filippov wrote:
>> For ext4, the maximum atomic write size is limited to the bigalloc
>> cluster size. Disk blocks are allocated to this cluster size granularity
>> and alignment. As such, a properly aligned atomic write <= cluster size
>> can never span discontiguous disk blocks.
>
> Ok, thank you for the explanation.
>
> But it seems that it's an internal implementation detail of ext4,
> right?
I think that it is fair to say that alignment constraints of atomic
write HW should mean specific alignment and granularity of FS disk blocks.
> So this check should be done inside ext4 code. And in fact I
> suspect it's actually already done there because generic checks which
> I suggest to remove can't take ext4 cluster size into account, so at
> least some atomic write validation is already done inside ext4. The
> only thing that's left is to move the write alignment check there too.
>
> Another thing that suggests that it's an internal implementation
> detail is that a CoW filesystem like ZFS or btrfs can probably provide
> atomic write guarantees for unaligned writes too, and probably even
> without hardware atomic write support.
Yes, xfs already does this.
>
> Can my change be limited to raw block devices then?
The atomic write API is based on:
a. doing statx to find atomic write min and max limits.
b. issuing a write with RWF_ATOMIC means that the write should be
naturally aligned and fit within the size limits.
That is the same for both raw block devices and regular FS files. And
any atomic write boundary is not part of the API.
>Thanks to your
> explanation now I understand the motivation for these checks with
> ext4, but they still make no sense for the raw NVMe disk.
>
> I mean, can you approve my change if I rework it to only lift 2^N and
> alignment checks for raw block devices and not for file systems? For
> example if I move these checks directly to the related ext4 and xfs
> code? I think it's the right place to do them.
What is the actual usecase you are trying to solve? You mentioned "avoid
journaling", which does not explain what you want to achieve.
You could arrange your data so that it suits the rules.
More information about the Linux-nvme
mailing list