[PATCH v2] nvme: report write pointer for a full zone as zone start + zone len
Niklas Cassel
Niklas.Cassel at wdc.com
Mon Nov 29 04:39:42 PST 2021
On Mon, Nov 29, 2021 at 08:18:24PM +0900, Damien Le Moal wrote:
> On 2021/11/27 10:14, Damien Le Moal wrote:
> > On 2021/11/26 19:42, Niklas Cassel wrote:
> >> From: Niklas Cassel <niklas.cassel at wdc.com>
> >>
> >> The write pointer in NVMe ZNS is invalid for a zone in zone state full.
> >> The same also holds true for ZAC/ZBC.
> >>
> >> The current behavior for NVMe is to simply propagate the wp reported by
> >> the drive, even for full zones. Since the wp is invalid for a full zone,
> >> the wp reported by the drive may be any value.
> >>
> >> The way that the sd_zbc driver handles a full zone is to always report
> >> the wp as zone start + zone len, regardless of what the drive reported.
> >> null_blk also follows this convention.
> >>
> >> Do the same for NVMe, so that a BLKREPORTZONE ioctl reports the write
> >> pointer for a full zone in a consistent way, regardless of the interface
> >> of the underlying zoned block device.
> >>
> >> blkzone report before patch:
> >> start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0xfffffffffffbfff8
> >> reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)]
> >>
> >> blkzone report after patch:
> >> start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0x040000 reset:0
> >> non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)]
> >>
> >> Signed-off-by: Niklas Cassel <niklas.cassel at wdc.com>
> >> ---
> >> Changes since v1:
> >> - Minor commit message rewording.
> >> - Use if/else instead of setting wp unconditionally and then
> >> conditionally updating it.
> >>
> >> drivers/nvme/host/zns.c | 5 ++++-
> >> 1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
> >> index bfc259e0d7b8..9f81beb4df4e 100644
> >> --- a/drivers/nvme/host/zns.c
> >> +++ b/drivers/nvme/host/zns.c
> >> @@ -166,7 +166,10 @@ static int nvme_zone_parse_entry(struct nvme_ns *ns,
> >> zone.len = ns->zsze;
> >> zone.capacity = nvme_lba_to_sect(ns, le64_to_cpu(entry->zcap));
> >> zone.start = nvme_lba_to_sect(ns, le64_to_cpu(entry->zslba));
> >> - zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp));
> >> + if (zone.cond == BLK_ZONE_COND_FULL)
> >> + zone.wp = zone.start + zone.len;
> >> + else
> >> + zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp));
> >>
> >> return cb(&zone, idx, data);
> >> }
> >>
> >
> > Looks good.
> >
> > Reviewed-by: Damien Le Moal <damien.lemoal at opensource.wdc.com>
> >
> > Note: read-only zones also have an undefined wp. So I wonder if we should not
> > set the wp similarly to full zones, to match the fact that we cannot write to
> > these zones. Same for offline zones, but these are tricky since they cannot be
> > read either, meaning that wp should be set to the zone start for that case...
>
> Thinking about this some more, I think we should do nothing. Reaction to RO or
> offline zones will always come from an IO error path, in which case, it should
> be clear to the user that the zone wp is invalid/undefined. E.g. zonefs has such
> IO error path.
Christoph, Keith,
Since there are no longer any outstanding questions on this patch,
please (re)consider this patch for inclusion.
Kind regards,
Niklas
More information about the Linux-nvme
mailing list