[PATCH v2 08/16] block: Limit atomic write IO size according to atomic_write_max_sectors
John Garry
john.g.garry at oracle.com
Fri Dec 15 05:55:22 PST 2023
On 15/12/2023 02:27, Ming Lei wrote:
> On Tue, Dec 12, 2023 at 11:08:36AM +0000, John Garry wrote:
>> Currently an IO size is limited to the request_queue limits max_sectors.
>> Limit the size for an atomic write to queue limit atomic_write_max_sectors
>> value.
>>
>> Signed-off-by: John Garry <john.g.garry at oracle.com>
>> ---
>> block/blk-merge.c | 12 +++++++++++-
>> block/blk.h | 3 +++
>> 2 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index 0ccc251e22ff..8d4de9253fe9 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -171,7 +171,17 @@ static inline unsigned get_max_io_size(struct bio *bio,
>> {
>> unsigned pbs = lim->physical_block_size >> SECTOR_SHIFT;
>> unsigned lbs = lim->logical_block_size >> SECTOR_SHIFT;
>> - unsigned max_sectors = lim->max_sectors, start, end;
>> + unsigned max_sectors, start, end;
>> +
>> + /*
>> + * We ignore lim->max_sectors for atomic writes simply because
>> + * it may less than bio->write_atomic_unit, which we cannot
>> + * tolerate.
>> + */
>> + if (bio->bi_opf & REQ_ATOMIC)
>> + max_sectors = lim->atomic_write_max_sectors;
>> + else
>> + max_sectors = lim->max_sectors;
Please note that I mentioned this issue in the cover letter, so you can
see some discussion there (if missed).
>
> I can understand the trouble for write atomic from bio split, which
> may simply split in the max_sectors boundary, however this change is
> still too fragile:
>
> 1) ->max_sectors may be set from userspace
> - so this change simply override userspace setting
yes
>
> 2) otherwise ->max_sectors is same with ->max_hw_sectors:
>
> - then something must be wrong in device side or driver side because
> ->write_atomic_unit conflicts with ->max_hw_sectors, which is supposed
> to be figured out before device is setup
>
Right, so I think that it is proper to limit atomic_write_unit_max et al
to max_hw_sectors or whatever DMA engine device limits. I can make that
change.
> 3) too big max_sectors may break driver or device, such as nvme-pci
> aligns max_hw_sectors with DMA optimized mapping size
>
> And there might be more(better) choices:
>
> 1) make sure atomic write limit is respected when userspace updates
> ->max_sectors
My mind has been changed and I would say no and we can treat
atomic_write_unit_max as special. Indeed, max_sectors and
atomic_write_unit_max are complementary. If someone sets max_sectors to
4KB and then tries an atomic write of 16KB then they don't know what
they are doing.
My original idea was to dynamically limit atomic_unit_unit_max et al to
max_sectors (so that max_sectors is always respected for atomic writes).
As an alternative, how about we keep the value of atomic_unit_unit_max
static, but reject an atomic write if it exceeds max_sectors? It's not
too different than dynamically limiting atomic_unit_unit_max. But as
mentioned, it may be asking for trouble....
>
> 2) when driver finds that atomic write limits conflict with other
> existed hardware limits, fail or solve(such as reduce write atomic unit) the
> conflict before queue is started; With single write atomic limits update API,
> the conflict can be figured out earlier by block layer too.
I think that we can do this, but I am not sure what other queue limits
may conflict (apart from max_sectors / max_sectors_hw). There is max
segment size, but we just rely on a single PAGE per iovec to evaluate
atomic_unit_unit_max, so should not be an issue.
Thanks,
John
More information about the Linux-nvme
mailing list