[PATCH v3 for-5.8-rc 0/5] address deadlocks in high stress ns scanning and ana updates
Sagi Grimberg
sagi at grimberg.me
Wed Jun 24 04:53:07 EDT 2020
Changes from v2:
- removed RFC patch #6 from the series
- patch #1: updated change log
- patch #2: renaming and nit restructuring in patch #2
- patch #3: Clarified change log and move srcu and requeue_work scheduling
outside of the head->lock
- patch #4: renamed flag to NVME_NSHEAD_DISK_LIVE
Changes from v1:
- Fixed compilation error in patch #4
- Added patch #5 to resolve a use-after-free condition
Hey All,
The following patches addresses some deadlocks observed while performing some
stress testing of a connect/disconnect storm in addition to rapid ana path
switches concurrently (paths may transition between live<->inaccessible)
on a large number of namespaces (100+).
The test mainly triggers three main flows:
1. ongoing ns scanning, in the presence of concurrent ANA path state changes
and controller removals (disconnect).
2. ongoing ns scanning (or ana processing) in the presence of concurrent
controller removal (disconnect).
3. ongoing ANA processing in the presence of concurrent controller removal
(disconnect).
What was observed is that basically when we disconnect while scan_work and/or ana_work
are running, we can easily deadlock. The main reason is that scan_work and ana_work
may both register the gendisk, triggering I/O (partition scans). Given that a
controller removal (disconnect) may also be running at the same time, I/O may
block. The issue with blocking the head->disk I/O under the locks taken by
both ana_work and scan_work, it means that no other path may update path states
and by doing so, unblock the blocking I/O.
With this patchset applied (plus the missing RFC patch that we dropped)
the test is able to pass successfully without any deadlocks.
Anton Eidelman (2):
nvme-multipath: fix deadlock between ana_work and scan_work
nvme-multipath: fix deadlock due to head->lock
Sagi Grimberg (3):
nvme: fix possible deadlock when I/O is blocked
nvme: don't protect ns mutation with ns->head->lock
nvme-multipath: fix bogus request queue reference put
drivers/nvme/host/core.c | 1 -
drivers/nvme/host/multipath.c | 46 ++++++++++++++++++++++-------------
drivers/nvme/host/nvme.h | 2 ++
3 files changed, 31 insertions(+), 18 deletions(-)
--
2.25.1
More information about the Linux-nvme
mailing list