[PATCH RFC 0/5] block: validate bios against queue limits in the entered context
Keith Busch
kbusch at meta.com
Tue May 19 10:23:21 PDT 2026
From: Keith Busch <kbusch at kernel.org>
The block layer validates bios against various queue limits and device
capacity in submit_bio_noacct(), but these checks run before
bio_queue_enter(). This means they are not serialized against drivers
that update queue limits inside a freeze window. A bio that passes
validation under old limits can enter the queue after the update and
reach the driver with an invalid configuration.
This series moves all limit-dependent validation into
__bio_split_to_limits(), which runs after the queue usage reference has
been acquired. This ensures proper serialization against limit updates.
This changes was motivated by a few recent reports:
https://lore.kernel.org/linux-nvme/MW5PR19MB548483D1FAE4F322E4C97352FD032@MW5PR19MB5484.namprd19.prod.outlook.com/
https://lore.kernel.org/linux-nvme/20260517053635.2282446-1-coshi036@gmail.com/
When an NVMe namespace that is reformatted to use extended metadata that
can't be controller generated/stripped, the driver sets capacity to 0
inside a freeze window because the block layer is not able to form a
viable request for this format. But a bio that passed bio_check_eod()
before the freeze can still reach nvme_setup_rw() after the update,
triggering a WARN that we didn't expect to be possible of reaching.
For NVMe multipath, moving these checks into the entered context
exacerbates a different problem: a bio targeting a path being torn down
(capacity set to 0) would be failed by the block layer before the driver
gets a chance to redirect it to another path. This is a pre-existing
problem, but the initial changes in this series make it easier to hit.
The series addresses this by introducing a new callback to
block_device_operations, called from bio_io_error(), that lets the
driver intercept and redirect failing bios before they are completed.
NVMe multipath uses this to requeue bios back to the head device for
path re-selection when the path is no longer ready.
Note that callers of submit_bio_noacct_nocheck() (bio split, throttle
dispatch) will now hit the validation checks in __bio_split_to_limits()
that they previously bypassed. This is intentional: these checks must
run in the entered context to be properly serialized, and cannot be
skipped so it is a performance cost that can't be avoided.
Keith Busch (5):
blk-mq: fix status for unaligned bio
block: fix invalid zone append status codes
block: validate bio bounds in the queue entered context
block: move bio operation validation into __bio_split_to_limits
block, nvme: add failed_bio callback for multipath bio failover
block/blk-core.c | 144 ----------------------------------
block/blk-mq.c | 3 +-
block/blk.h | 89 ++++++++++++++++++++-
drivers/nvme/host/core.c | 1 +
drivers/nvme/host/multipath.c | 26 ++++++
drivers/nvme/host/nvme.h | 2 +
include/linux/bio.h | 6 --
include/linux/blkdev.h | 16 ++++
8 files changed, 133 insertions(+), 154 deletions(-)
--
2.53.0-Meta
More information about the Linux-nvme
mailing list