[PATCH v6 00/10] block atomic writes
John Garry
john.g.garry at oracle.com
Fri Apr 5 03:55:41 PDT 2024
On 05/04/2024 11:20, Kent Overstreet wrote:
>>> The thing is that there's no requirement for an interface as complex as
>>> the one you're proposing here. I've talked to a few database people
>>> and all they want is to increase the untorn write boundary from "one
>>> disc block" to one database block, typically 8kB or 16kB.
>>>
>>> So they would be quite happy with a much simpler interface where they
>>> set the inode block size at inode creation time, and then all writes to
>>> that inode were guaranteed to be untorn. This would also be simpler to
>>> implement for buffered writes.
>> You're conflating filesystem functionality that applications will use
>> with hardware and block-layer enablement that filesystems and
>> filesystem utilities need to configure the filesystem in ways that
>> allow users to make use of atomic write capability of the hardware.
>>
>> The block layer functionality needs to export everything that the
>> hardware can do and filesystems will make use of. The actual
>> application usage and setup of atomic writes at the filesystem/page
>> cache layer is a separate problem. i.e. The block layer interfaces
>> need only support direct IO and expose limits for issuing atomic
>> direct IO, and nothing more. All the more complex stuff to make it
>> "easy to use" is filesystem level functionality and completely
>> outside the scope of this patchset....
> A CoW filesystem can implement atomic writes without any block device
> support. It seems to me that might have been the easier place to start -
> start by getting the APIs right, then do all the plumbing for efficient
> untorn writes on non CoW filesystems...
03/10 and 04/10 in this series define the user API, i.e. RWF_ATOMIC and
statx updates.
Any filesystem-specific changes - like in
https://lore.kernel.org/linux-xfs/20240304130428.13026-1-john.g.garry@oracle.com/
- are just for enabling this API for that filesystem.
More information about the Linux-nvme
mailing list