[PATCH v2 00/16] block atomic writes

John Garry john.g.garry at oracle.com
Thu Jan 11 01:55:36 PST 2024


On 11/01/2024 05:02, Christoph Hellwig wrote:
> On Wed, Jan 10, 2024 at 05:40:56PM -0800, Darrick J. Wong wrote:
>> struct statx statx;
>> struct fsxattr fsxattr;
>> int fd = open('/foofile', O_RDWR | O_DIRECT);

I'm assuming O_CREAT also.

>>
>> ioctl(fd, FS_IOC_GETXATTR, &fsxattr);
>>
>> fsxattr.fsx_xflags |= FS_XFLAG_FORCEALIGN | FS_XFLAG_WRITE_ATOMIC;
>> fsxattr.fsx_extsize = 16384; /* only for hardware no-tears writes */
>>
>> ioctl(fd, FS_IOC_SETXATTR, &fsxattr);
>>
>> statx(fd, "", AT_EMPTY_PATH, STATX_ALL | STATX_WRITE_ATOMIC, &statx);
>>
>> if (statx.stx_atomic_write_unit_max >= 16384) {
>> 	pwrite(fd, &iov, 1, 0, RWF_SYNC | RWF_ATOMIC);
>> 	printf("HAPPY DANCE\n");
>> }
> 
> I think this still needs a check if the fs needs alignment for
> atomic writes at all. i.e.
> 
> struct statx statx;
> struct fsxattr fsxattr;
> int fd = open('/foofile', O_RDWR | O_DIRECT);
> 
> ioctl(fd, FS_IOC_GETXATTR, &fsxattr);
> statx(fd, "", AT_EMPTY_PATH, STATX_ALL | STATX_WRITE_ATOMIC, &statx);
> if (statx.stx_atomic_write_unit_max < 16384) {
> 	bailout();
> }

How could this value be >= 16384 initially? Would it be from 
pre-configured FS alignment, like XFS RT extsize? Or is this from some 
special CoW-based atomic write support? Or FS block size of 16384?

Incidentally, for consistency only setting FS_XFLAG_WRITE_ATOMIC will 
lead to FMODE_CAN_ATOMIC_WRITE being set. So until FS_XFLAG_WRITE_ATOMIC 
is set would it make sense to have statx return 0 for 
STATX_WRITE_ATOMIC. Otherwise the user may be misled to think that it is 
ok to issue an atomic write (when it isn’t).

Thanks,
John

> 
> fsxattr.fsx_xflags |= FS_XFLAG_WRITE_ATOMIC;
> if (statx.stx_atomic_write_alignment) {
> 	fsxattr.fsx_xflags |= FS_XFLAG_FORCEALIGN;
> 	fsxattr.fsx_extsize = 16384; /* only for hardware no-tears writes */
> }
> if (ioctl(fd, FS_IOC_SETXATTR, &fsxattr) < 1) {
> 	bailout();
> }
> 
> pwrite(fd, &iov, 1, 0, RWF_SYNC | RWF_ATOMIC);
> printf("HAPPY DANCE\n");
> 





More information about the Linux-nvme mailing list