[PATCH 03/13] block: handle zone management operations completions
Damien Le Moal
dlemoal at kernel.org
Mon Nov 3 04:59:49 PST 2025
On 11/3/25 20:41, Hannes Reinecke wrote:
> On 10/31/25 07:12, Damien Le Moal wrote:
>> The functions blk_zone_wplug_handle_reset_or_finish() and
>> blk_zone_wplug_handle_reset_all() both modify the zone write pointer
>> offset of zone write plugs that are the target of a reset, reset all or
>> finish zone management operation. However, these functions do this
>> modification before the BIO is executed. So if the zone operation fails,
>> the modified zone write pointer offsets become invalid.
>>
>> Avoid this by modifying the zone write pointer offset of a zone write
>> plug that is the target of a zone management operation when the
>> operation completes. To do so, modify blk_zone_bio_endio() to call the
>> new function blk_zone_mgmt_bio_endio() which in turn calls the functions
>> blk_zone_reset_all_bio_endio(), blk_zone_reset_bio_endio() or
>> blk_zone_finish_bio_endio() depending on the operation of the completed
>> BIO, to modify a zone write plug write pointer offset accordingly.
>> These functions are called only if the BIO execution was successful.
>>
> Hmm.
> Question remains: what _is_ the status of a write pointer once a
> zone management operation is in flight?
On the device, it will be unchanged until the command completes, or rather, one
can only see it that way since the drive will serialize such command with report
zones.
> I would argue it's turning into a Schroedinger state, and so we
> cannot make any assumptions here.
Let me try to skin that cat below :)
> In particular we cannot issue any other write I/O to that zone once
> the operation is in flight, and so it becomes meaningless if we set
> the write pointer before or after the zone operation.
> Once the operation fails we have to issue a 'report write pointer'
> command anyway as I'd be surprised if we could assume that a failure
> in a zone management command would leave the write pointer unmodified.
> So I would assume that zone write plugging already blocks the zone
> while an zone management command is in flight.
> But if it does, why do we need this patch?
There is no such "blocking" done, the user is free to issue a zone reset while
writes are n flight, and most likely get write errors as a result such bad practice.
For this patch, the assumption is that a failed zone reset or zone finish leaves
the zone write pointer untouched. All the drives I know do that. So it is better
to not modify the zone write plug write pointer offset until we complete the
command.
But granted, that is not always true since the failure may happen *after* the
drive completed the command (e.g. the HBA loses the connection with the drive
before signaling the completion or something like that). In such case, it would
not matter when the update is done. And for zone reset all commands, all bets
are off since the command may fail half-way through all the zones that need a reset.
But in the end, logically speaking, it makes more sense to update things when we
get a success result instead of assuming we will always succeed. This has also
the benefit of leaving the zone write plugs in place for eventual error recovery
if needed.
--
Damien Le Moal
Western Digital Research
More information about the Linux-nvme
mailing list