[PATCH 10/21] block: Add fops atomic write support
Bart Van Assche
bvanassche at acm.org
Mon Oct 2 12:12:49 PDT 2023
On 10/2/23 03:10, John Garry wrote:
> On 29/09/2023 18:51, Bart Van Assche wrote:
>> On 9/29/23 03:27, John Garry wrote:
> > + if (pos % atomic_write_unit_min_bytes)
> > + return false;
>
> See later rules.
Is atomic_write_unit_min_bytes always equal to the logical block size?
If so, can the above test be left out?
> > + if (iov_iter_count(iter) % atomic_write_unit_min_bytes)
> > + return false;
>
> For SCSI, there is an atomic write granularity, which dictates
> atomic_write_unit_min_bytes. So here we need to ensure that the length
> is a multiple of this value.
Are there any SCSI devices that we care about that report an ATOMIC
TRANSFER LENGTH GRANULARITY that is larger than a single logical block?
I'm wondering whether we really have to support such devices.
> > + if (!is_power_of_2(iov_iter_count(iter)))
> > + return false;
>
> This rule comes from FS block alignment and NVMe atomic boundary.
>
> FSes (XFS) have discontiguous extents. We need to ensure that an atomic
> write does not cross discontiguous extents. To do this we ensure extent
> length and alignment and limit atomic_write_unit_max_bytes to that.
>
> For NVMe, an atomic write boundary is a boundary in LBA space which an
> atomic write should not cross. We limit atomic_write_unit_max_bytes such
> that it is evenly divisible into this atomic write boundary.
>
> To ensure that the write does not cross these alignment boundaries we
> say that it must be naturally aligned and a power-of-2 in length.
>
> We may be able to relax this rule but I am not sure it buys us anything
> - typically we want to be writing a 64KB block aligned to 64KB, for
> example.
It seems to me that the requirement is_power_of_2(iov_iter_count(iter))
is necessary for some filesystems but not for all filesystems.
Restrictions that are specific to a single filesystem (XFS) should not
occur in code that is intended to be used by all filesystems
(blkdev_atomic_write_valid()).
Thanks,
Bart.
More information about the Linux-nvme
mailing list