[PATCH v2] nvme: report write pointer for a full zone as zone start + zone len

Niklas Cassel Niklas.Cassel at wdc.com
Mon Nov 29 04:39:42 PST 2021


On Mon, Nov 29, 2021 at 08:18:24PM +0900, Damien Le Moal wrote:
> On 2021/11/27 10:14, Damien Le Moal wrote:
> > On 2021/11/26 19:42, Niklas Cassel wrote:
> >> From: Niklas Cassel <niklas.cassel at wdc.com>
> >>
> >> The write pointer in NVMe ZNS is invalid for a zone in zone state full.
> >> The same also holds true for ZAC/ZBC.
> >>
> >> The current behavior for NVMe is to simply propagate the wp reported by
> >> the drive, even for full zones. Since the wp is invalid for a full zone,
> >> the wp reported by the drive may be any value.
> >>
> >> The way that the sd_zbc driver handles a full zone is to always report
> >> the wp as zone start + zone len, regardless of what the drive reported.
> >> null_blk also follows this convention.
> >>
> >> Do the same for NVMe, so that a BLKREPORTZONE ioctl reports the write
> >> pointer for a full zone in a consistent way, regardless of the interface
> >> of the underlying zoned block device.
> >>
> >> blkzone report before patch:
> >> start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0xfffffffffffbfff8
> >> reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)]
> >>
> >> blkzone report after patch:
> >> start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0x040000 reset:0
> >> non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)]
> >>
> >> Signed-off-by: Niklas Cassel <niklas.cassel at wdc.com>
> >> ---
> >> Changes since v1:
> >> - Minor commit message rewording.
> >> - Use if/else instead of setting wp unconditionally and then
> >>   conditionally updating it.
> >>
> >>  drivers/nvme/host/zns.c | 5 ++++-
> >>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
> >> index bfc259e0d7b8..9f81beb4df4e 100644
> >> --- a/drivers/nvme/host/zns.c
> >> +++ b/drivers/nvme/host/zns.c
> >> @@ -166,7 +166,10 @@ static int nvme_zone_parse_entry(struct nvme_ns *ns,
> >>  	zone.len = ns->zsze;
> >>  	zone.capacity = nvme_lba_to_sect(ns, le64_to_cpu(entry->zcap));
> >>  	zone.start = nvme_lba_to_sect(ns, le64_to_cpu(entry->zslba));
> >> -	zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp));
> >> +	if (zone.cond == BLK_ZONE_COND_FULL)
> >> +		zone.wp = zone.start + zone.len;
> >> +	else
> >> +		zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp));
> >>  
> >>  	return cb(&zone, idx, data);
> >>  }
> >>
> > 
> > Looks good.
> > 
> > Reviewed-by: Damien Le Moal <damien.lemoal at opensource.wdc.com>
> > 
> > Note: read-only zones also have an undefined wp. So I wonder if we should not
> > set the wp similarly to full zones, to match the fact that we cannot write to
> > these zones. Same for offline zones, but these are tricky since they cannot be
> > read either, meaning that wp should be set to the zone start for that case...
> 
> Thinking about this some more, I think we should do nothing. Reaction to RO or
> offline zones will always come from an IO error path, in which case, it should
> be clear to the user that the zone wp is invalid/undefined. E.g. zonefs has such
> IO error path.

Christoph, Keith,

Since there are no longer any outstanding questions on this patch,
please (re)consider this patch for inclusion.


Kind regards,
Niklas


More information about the Linux-nvme mailing list