[PATCH 17/21] fs: xfs: iomap atomic write support
John Garry
john.g.garry at oracle.com
Mon Dec 4 07:19:15 PST 2023
On 04/12/2023 13:45, Christoph Hellwig wrote:
> On Tue, Nov 28, 2023 at 05:42:10PM +0000, John Garry wrote:
>> ok, fine, it would not be required for XFS with CoW. Some concerns still:
>> a. device atomic write boundary, if any
>> b. other FSes which do not have CoW support. ext4 is already being used for
>> "atomic writes" in the field - see dubious amazon torn-write prevention.
>
> What is the 'dubious amazon torn-write prevention'?
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage-twp.html
AFAICS, this is without any kernel changes, so no guarantee of unwanted
splitting or merging of bios.
Anyway, there will still be !CoW FSes which people want to support.
>
>> About b., we could add the pow-of-2 and file offset alignment requirement
>> for other FSes, but then need to add some method to advertise that
>> restriction.
>
> We really need a better way to communicate I/O limitations anyway.
> Something like XFS_IOC_DIOINFO on steroids.
>
>> Sure, but to me it is a concern that we have 2x paths to make robust a.
>> offload via hw, which may involve CoW b. no HW support, i.e. CoW always
>
> Relying just on the hardware seems very limited, especially as there is
> plenty of hardware that won't guarantee anything larger than 4k, and
> plenty of NVMe hardware without has some other small limit like 32k
> because it doesn't support multiple atomicy mode.
So what would you propose as the next step? Would it to be first achieve
atomic write support for XFS with HW support + CoW to ensure contiguous
extents (and without XFS forcealign)?
>
>> And for no HW support, if we don't follow the O_ATOMIC model of committing
>> nothing until a SYNC is issued, would we allocate, write, and later free a
>> new extent for each write, right?
>
> Yes. Then again if you do data journalling you do that anyway, and as
> one little project I'm doing right now shows that data journling is
> often the fastest thing we can do for very small writes.
Ignoring FSes, then how is this supposed to work for block devices? We
just always need HW support, right?
Thanks,
John
More information about the Linux-nvme
mailing list