[PATCH RFC 1/2] nvme: downgrade WARN in nvme_setup_rw to pr_debug

Keith Busch kbusch at kernel.org
Sun May 17 15:42:29 PDT 2026


On Sun, May 17, 2026 at 04:05:07PM -0600, Keith Busch wrote:
> On Sat, May 16, 2026 at 11:53:54PM -0400, Chao S wrote:
> > On Thu, May 07, 2026 at 07:12:26PM +0100, Keith Busch wrote:
> > > [...] how that was defeated [...]
> > 
> > Hi Keith,
> > 
> > Not the freeze.  The WARN does not depend on q->limits, but on
> > ns->head->ms (read live at dispatch, set inside the freeze window) and
> > on REQ_INTEGRITY, never set for EXT_LBAS-non-PI.  capacity==0 only
> > gates submission (bio_check_eod()), not dispatch: a writeback bio that
> > passed bio_check_eod() under the old capacity sits on the task plug
> > holding no q_usage_counter ref, so it does not block the freeze;
> > blk_finish_plug() flushes it after the update committed head->ms != 0
> > (dmesg: the capacity-change line prints before the WARN).
> > 
> > So it is reachable -- the host-unaware geometry change you described,
> > unrelated to your block fix.  The deeper fencing gap is the separate
> > TP-level issue; v2 does not attempt it, it only stops a
> > device-reachable, already-safely-rejected dispatch from being a WARN
> > (a panic under panic_on_warn).
> 
> I think tHe WARN is serving it's intendeded purpose: the block layer
> shouldn't have submitted this request. You can't do generic read/write
> with extended metadatate as the DMA is going to corrupt memory with
> respect to what the block layer expects.
> 
> This driver is depending on the capacity constraint to prevent this
> scenario, so I think The "end-of-device" check needs to happen within
> the entered queue context. If there's a scenario that escapes that
> check, then I think that's what needs fixing, not the driver.

Does this fix it? I don't necessarily like having yet another check in
the hotpath, but should be exactly the check that drivers expected to be
done, so should be cache hot in the normal case.

---
diff --git a/block/blk-core.c b/block/blk-core.c
index 17450058ea6d8..4b5fb32a7d6f8 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -553,7 +553,7 @@ ALLOW_ERROR_INJECTION(should_fail_bio, ERRNO);
  * This may well happen - the kernel calls bread() without checking the size of
  * the device, e.g., when mounting a file system.
  */
-static inline int bio_check_eod(struct bio *bio)
+int bio_check_eod(struct bio *bio)
 {
 	sector_t maxsector = bdev_nr_sectors(bio->bi_bdev);
 	unsigned int nr_sectors = bio_sectors(bio);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4c5c16cce4f8f..b75117ec5c988 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3186,6 +3186,11 @@ void blk_mq_submit_bio(struct bio *bio)
 		goto queue_exit;
 	}
 
+	if (unlikely(bio_check_eod(bio))) {
+		bio_io_error(bio);
+		goto queue_exit;
+	}
+
 	if ((bio->bi_opf & REQ_POLLED) && !blk_mq_can_poll(q)) {
 		bio->bi_status = BLK_STS_NOTSUPP;
 		bio_endio(bio);
diff --git a/block/blk.h b/block/blk.h
index b998a7761faf3..84515bb75485d 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -54,6 +54,7 @@ bool blk_queue_start_drain(struct request_queue *q);
 bool __blk_freeze_queue_start(struct request_queue *q,
 			      struct task_struct *owner);
 int __bio_queue_enter(struct request_queue *q, struct bio *bio);
+int bio_check_eod(struct bio *bio);
 void submit_bio_noacct_nocheck(struct bio *bio, bool split);
 int bio_submit_or_kill(struct bio *bio, unsigned int flags);
 
-- 



More information about the Linux-nvme mailing list