[PATCH v2 08/16] block: Limit atomic write IO size according to atomic_write_max_sectors

John Garry john.g.garry at oracle.com
Fri Dec 15 05:55:22 PST 2023


On 15/12/2023 02:27, Ming Lei wrote:
> On Tue, Dec 12, 2023 at 11:08:36AM +0000, John Garry wrote:
>> Currently an IO size is limited to the request_queue limits max_sectors.
>> Limit the size for an atomic write to queue limit atomic_write_max_sectors
>> value.
>>
>> Signed-off-by: John Garry <john.g.garry at oracle.com>
>> ---
>>   block/blk-merge.c | 12 +++++++++++-
>>   block/blk.h       |  3 +++
>>   2 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 0ccc251e22ff..8d4de9253fe9 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -171,7 +171,17 @@ static inline unsigned get_max_io_size(struct bio *bio,
>>   {
>>   	unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT;
>>   	unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT;
>> -	unsigned max_sectors = lim->max_sectors, start, end;
>> +	unsigned max_sectors, start, end;
>> +
>> +	/*
>> +	 * We ignore lim->max_sectors for atomic writes simply because
>> +	 * it may less than bio->write_atomic_unit, which we cannot
>> +	 * tolerate.
>> +	 */
>> +	if (bio->bi_opf & REQ_ATOMIC)
>> +		max_sectors = lim->atomic_write_max_sectors;
>> +	else
>> +		max_sectors = lim->max_sectors;

Please note that I mentioned this issue in the cover letter, so you can 
see some discussion there (if missed).

> 
> I can understand the trouble for write atomic from bio split, which
> may simply split in the max_sectors boundary, however this change is
> still too fragile:
> 
> 1) ->max_sectors may be set from userspace
> - so this change simply override userspace setting

yes

> 
> 2) otherwise ->max_sectors is same with ->max_hw_sectors:
> 
> - then something must be wrong in device side or driver side because
> ->write_atomic_unit conflicts with ->max_hw_sectors, which is supposed
> to be figured out before device is setup
> 

Right, so I think that it is proper to limit atomic_write_unit_max et al 
to max_hw_sectors or whatever DMA engine device limits. I can make that 
change.

> 3) too big max_sectors may break driver or device, such as nvme-pci
> aligns max_hw_sectors with DMA optimized mapping size
> 
> And there might be more(better) choices:
> 
> 1) make sure atomic write limit is respected when userspace updates
> ->max_sectors

My mind has been changed and I would say no and we can treat 
atomic_write_unit_max as special. Indeed, max_sectors and 
atomic_write_unit_max are complementary. If someone sets max_sectors to 
4KB and then tries an atomic write of 16KB then they don't know what 
they are doing.

My original idea was to dynamically limit atomic_unit_unit_max et al to 
max_sectors (so that max_sectors is always respected for atomic writes).

As an alternative, how about we keep the value of atomic_unit_unit_max 
static, but reject an atomic write if it exceeds max_sectors? It's not 
too different than dynamically limiting atomic_unit_unit_max. But as 
mentioned, it may be asking for trouble....

> 
> 2) when driver finds that atomic write limits conflict with other
> existed hardware limits, fail or solve(such as reduce write atomic unit) the
> conflict before queue is started; With single write atomic limits update API,
> the conflict can be figured out earlier by block layer too.

I think that we can do this, but I am not sure what other queue limits 
may conflict (apart from max_sectors / max_sectors_hw). There is max 
segment size, but we just rely on a single PAGE per iovec to evaluate 
atomic_unit_unit_max, so should not be an issue.

Thanks,
John



More information about the Linux-nvme mailing list