[PATCH 15/21] fs: xfs: Support atomic write for statx

John Garry john.g.garry at oracle.com
Tue Oct 3 03:56:52 PDT 2023


On 03/10/2023 04:32, Dave Chinner wrote:
> On Fri, Sep 29, 2023 at 10:27:20AM +0000, John Garry wrote:
>> Support providing info on atomic write unit min and max for an inode.
>>
>> For simplicity, currently we limit the min at the FS block size, but a
>> lower limit could be supported in future.
>>
>> The atomic write unit min and max is limited by the guaranteed extent
>> alignment for the inode.
>>
>> Signed-off-by: John Garry <john.g.garry at oracle.com>
>> ---
>>   fs/xfs/xfs_iops.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++
>>   fs/xfs/xfs_iops.h |  4 ++++
>>   2 files changed, 55 insertions(+)
>>
>> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
>> index 1c1e6171209d..5bff80748223 100644
>> --- a/fs/xfs/xfs_iops.c
>> +++ b/fs/xfs/xfs_iops.c
>> @@ -546,6 +546,46 @@ xfs_stat_blksize(
>>   	return PAGE_SIZE;
>>   }
>>   
>> +void xfs_ip_atomic_write_attr(struct xfs_inode *ip,
>> +			xfs_filblks_t *unit_min_fsb,
>> +			xfs_filblks_t *unit_max_fsb)
> 
> Formatting.

Change args to 1x tab indent, right?

> 
> Also, we don't use variable name shorthand for function names -
> xfs_get_atomic_write_hint(ip) to match xfs_get_extsz_hint(ip)
> would be appropriate, right?

Changing the name format would be ok. However we are not returning a 
hint, but rather the inode atomic write unit min and max values in FS 
blocks. Anyway, I'll look to rework the name.

> 
> 
> 
>> +{
>> +	xfs_extlen_t		extsz_hint = xfs_get_extsz_hint(ip);
>> +	struct xfs_buftarg	*target = xfs_inode_buftarg(ip);
>> +	struct block_device	*bdev = target->bt_bdev;
>> +	struct xfs_mount	*mp = ip->i_mount;
>> +	xfs_filblks_t		atomic_write_unit_min,
>> +				atomic_write_unit_max,
>> +				align;
>> +
>> +	atomic_write_unit_min = XFS_B_TO_FSB(mp,
>> +		queue_atomic_write_unit_min_bytes(bdev->bd_queue));
>> +	atomic_write_unit_max = XFS_B_TO_FSB(mp,
>> +		queue_atomic_write_unit_max_bytes(bdev->bd_queue));
> 
> These should be set in the buftarg at mount time, like we do with
> sector size masks. Then we don't need to convert them to fsbs on
> every single lookup.

ok, fine. However I do still have a doubt on whether these values should 
be changeable - please see (small) comment about 
atomic_write_max_sectors in patch 7/21

> 
>> +	/* for RT, unset extsize gives hint of 1 */
>> +	/* for !RT, unset extsize gives hint of 0 */
>> +	if (extsz_hint && (XFS_IS_REALTIME_INODE(ip) ||
>> +	    (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN)))
> 
> Logic is non-obvious. The compound is (rt || force), not
> (extsz && rt), so it took me a while to actually realise I read this
> incorrectly.
> 
> 	if (extsz_hint &&
> 	    (XFS_IS_REALTIME_INODE(ip) ||
> 	     (ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN))) {
> 
>> +		align = extsz_hint;
>> +	else
>> +		align = 1;
> 
> And now the logic looks wrong to me. We don't want to use extsz hint
> for RT inodes if force align is not set, this will always use it
> regardless of the fact it has nothing to do with force alignment.

extsz_hint comes from xfs_get_extsz_hint(), which gives us the SB 
extsize for the RT inode and this alignment is guaranteed, no?

> 
> Indeed, if XFS_DIFLAG2_FORCEALIGN is not set, then shouldn't this
> always return min/max = 0 because atomic alignments are not in us on
> this inode?

As above, for RT I thought that extsize alignment was guaranteed and we 
don't need to bother with XFS_DIFLAG2_FORCEALIGN there.

> 
> i.e. the first thing this code should do is:
> 
> 	*unit_min_fsb = 0;
> 	*unit_max_fsb = 0;
> 	if (!(ip->i_diflags2 & XFS_DIFLAG2_FORCEALIGN))
> 		return;
> 
> Then we can check device support:
> 
> 	if (!buftarg->bt_atomic_write_max)
> 		return;
> 
> Then we can check for extent size hints. If that's not set:
> 
> 	align = xfs_get_extsz_hint(ip);
> 	if (align <= 1) {
> 		unit_min_fsb = 1;
> 		unit_max_fsb = 1;
> 		return;
> 	}
> 
> And finally, if there is an extent size hint, we can return that.
> 
>> +	if (atomic_write_unit_max == 0) {
>> +		*unit_min_fsb = 0;
>> +		*unit_max_fsb = 0;
>> +	} else if (atomic_write_unit_min == 0) {
>> +		*unit_min_fsb = 1;
>> +		*unit_max_fsb = min_t(xfs_filblks_t, atomic_write_unit_max,
>> +					align);
> 
> Why is it valid for a device to have a zero minimum size?

It's not valid. Local variables atomic_write_unit_max and 
atomic_write_unit_min unit here is FS blocks - maybe I should change names.

The idea is that for simplicity we won't support atomic writes for XFS 
of size less than 1x FS block initially. So if the bdev has - for 
example - queue_atomic_write_unit_min_bytes() == 2K and 
queue_atomic_write_unit_max_bytes() == 64K, then (ignoring alignment) we 
say that unit_min_fsb = 1 and unit_max_fsb = 16 (for 4K FS blocks).

> If it can
> set a maximum, it should -always- set a minimum size as logical
> sector size is a valid lower bound, yes?
> 
>> +	} else {
>> +		*unit_min_fsb = min_t(xfs_filblks_t, atomic_write_unit_min,
>> +					align);
>> +		*unit_max_fsb = min_t(xfs_filblks_t, atomic_write_unit_max,
>> +					align);
>> +	}
> 
> Nothing here guarantees the power-of-2 sizes that the RWF_ATOMIC
> user interface requires....

atomic_write_unit_min and atomic_write_unit_max will be powers-of-2 (or 0).

But, you are right, we don't check align is a power-of-2 - that can be 
added.

> 
> It also doesn't check that the extent size hint is aligned with
> atomic write units.

If we add a check for align being a power-of-2 and atomic_write_unit_min 
and atomic_write_unit_max are already powers-of-2, then this can be 
relied on, right?

> 
> It also doesn't check either against stripe unit alignment....

As mentioned in earlier response, this could be enforced.

> 
>> +}
>> +
>>   STATIC int
>>   xfs_vn_getattr(
>>   	struct mnt_idmap	*idmap,
>> @@ -614,6 +654,17 @@ xfs_vn_getattr(
>>   			stat->dio_mem_align = bdev_dma_alignment(bdev) + 1;
>>   			stat->dio_offset_align = bdev_logical_block_size(bdev);
>>   		}
>> +		if (request_mask & STATX_WRITE_ATOMIC) {
>> +			xfs_filblks_t unit_min_fsb, unit_max_fsb;
>> +
>> +			xfs_ip_atomic_write_attr(ip, &unit_min_fsb,
>> +				&unit_max_fsb);
>> +			stat->atomic_write_unit_min = XFS_FSB_TO_B(mp, unit_min_fsb);
>> +			stat->atomic_write_unit_max = XFS_FSB_TO_B(mp, unit_max_fsb);
> 
> That's just nasty. We pull byte units from the bdev, convert them to
> fsb to round them, then convert them back to byte counts. We should
> be doing all the work in one set of units....

ok, agreed. bytes is probably best.

> 
>> +			stat->attributes |= STATX_ATTR_WRITE_ATOMIC;
>> +			stat->attributes_mask |= STATX_ATTR_WRITE_ATOMIC;
>> +			stat->result_mask |= STATX_WRITE_ATOMIC;
> 
> If the min/max are zero, then atomic writes are not supported on
> this inode, right? Why would we set any of the attributes or result
> mask to say it is supported on this file?

ok, we won't set STATX_ATTR_WRITE_ATOMIC for min/max are zero

Thanks,
John



More information about the Linux-nvme mailing list