[PATCH RFC 0/5] block: validate bios against queue limits in the entered context

Keith Busch kbusch at meta.com
Tue May 19 10:23:21 PDT 2026


From: Keith Busch <kbusch at kernel.org>

The block layer validates bios against various queue limits and device
capacity in submit_bio_noacct(), but these checks run before
bio_queue_enter(). This means they are not serialized against drivers
that update queue limits inside a freeze window. A bio that passes
validation under old limits can enter the queue after the update and
reach the driver with an invalid configuration.

This series moves all limit-dependent validation into
__bio_split_to_limits(), which runs after the queue usage reference has
been acquired. This ensures proper serialization against limit updates.

This changes was motivated by a few recent reports:

  https://lore.kernel.org/linux-nvme/MW5PR19MB548483D1FAE4F322E4C97352FD032@MW5PR19MB5484.namprd19.prod.outlook.com/
  https://lore.kernel.org/linux-nvme/20260517053635.2282446-1-coshi036@gmail.com/

When an NVMe namespace that is reformatted to use extended metadata that
can't be controller generated/stripped, the driver sets capacity to 0
inside a freeze window because the block layer is not able to form a
viable request for this format.  But a bio that passed bio_check_eod()
before the freeze can still reach nvme_setup_rw() after the update,
triggering a WARN that we didn't expect to be possible of reaching.

For NVMe multipath, moving these checks into the entered context
exacerbates a different problem: a bio targeting a path being torn down
(capacity set to 0) would be failed by the block layer before the driver
gets a chance to redirect it to another path. This is a pre-existing
problem, but the initial changes in this series make it easier to hit.
The series addresses this by introducing a new callback to
block_device_operations, called from bio_io_error(), that lets the
driver intercept and redirect failing bios before they are completed.
NVMe multipath uses this to requeue bios back to the head device for
path re-selection when the path is no longer ready.

Note that callers of submit_bio_noacct_nocheck() (bio split, throttle
dispatch) will now hit the validation checks in __bio_split_to_limits()
that they previously bypassed. This is intentional: these checks must
run in the entered context to be properly serialized, and cannot be
skipped so it is a performance cost that can't be avoided.

Keith Busch (5):
  blk-mq: fix status for unaligned bio
  block: fix invalid zone append status codes
  block: validate bio bounds in the queue entered context
  block: move bio operation validation into __bio_split_to_limits
  block, nvme: add failed_bio callback for multipath bio failover

 block/blk-core.c              | 144 ----------------------------------
 block/blk-mq.c                |   3 +-
 block/blk.h                   |  89 ++++++++++++++++++++-
 drivers/nvme/host/core.c      |   1 +
 drivers/nvme/host/multipath.c |  26 ++++++
 drivers/nvme/host/nvme.h      |   2 +
 include/linux/bio.h           |   6 --
 include/linux/blkdev.h        |  16 ++++
 8 files changed, 133 insertions(+), 154 deletions(-)

-- 
2.53.0-Meta




More information about the Linux-nvme mailing list