[PATCH 3/3] nvme-core: fix crash when nvme_enable_aen timeout
Chao Leng
lengchao at huawei.com
Wed Aug 19 23:54:13 EDT 2020
A crash happens When we test nvme over roce with link blink. The
reason: nvme_enable_aen falsely start async_event_work when set
sync_event feature timeout, but async_event_sqe and qp of the queue
already be freeed when timeout. if async_event_work scheduling is
delayed for busy cpu, crash happens because use after free.
log:
[ 2229.253424] nvme nvme0: I/O 21 QID 0 timeout
[ 2229.253427] nvme nvme0: starting error recovery
[ 2229.354181] nvme nvme0: Failed to configure AEN (cfg 100)
[ 2229.354373] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 2229.354928] #PF: supervisor write access in kernel mode
[ 2229.357945] #PF: error_code(0x0002) - not-present page
[ 2229.361009] PGD 0 P4D 0
[ 2229.364052] Oops: 0002 [#1] SMP PTI
[ 2229.367132] CPU: 4 PID: 17561 Comm: kworker/u12:0 Kdump: loaded Tainted: G OE 5.7.8 #1
[ 2229.369124] nvme nvme0: Reconnecting in 10 seconds...
[ 2229.370412] Hardware name: Huawei RH1288 V3/BC11HGSC0,
BIOS 5.03 07/25/2018
[ 2229.370421] Workqueue: nvme-wq nvme_async_event_work [nvme_core]
[ 2229.380029] RIP: 0010:nvme_rdma_submit_async_event+0x74/0x160
[nvme_rdma]
[ 2229.383408] Code: 48 85 c0 0f 84 e4 00 00 00 48 8b 40 50 48 85 c0 74
0f b9 01 00 00 00 ba 40 00 00 00 e8 25 d5 7a f9 48 8d 7b 08 48 89 d9 31
c0 <48> c7 03 00 00 00 00 48 83 e7 f8 48 c7 43 38 00 00 00 00 48 29 f9
[ 2229.391164] RSP: 0018:ffffa864c14fbe30 EFLAGS: 00010246
[ 2229.395215] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2229.399478] RDX: 0000000000000040 RSI: 000000024d89bc00 RDI: 0000000000000008
[ 2229.403785] RBP: ffff9267c76b82f8 R08: 0000000000000000 R09: 0071772d656d766e
[ 2229.408223] R10: 8080808080808080 R11: 0000000000000000 R12: ffff9267bf6ba800
[ 2229.412758] R13: ffff9267ae980000 R14: 0ffff9267d6ba8a0 R15: ffff9267c76b8c20
[ 2229.417401] FS: 0000000000000000(0000) GS:ffff9267ffd00000(0000) knlGS:0000000000000000
[ 2229.422360] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2229.427046] CR2: 0000000000000000 CR3: 0000000237188006 CR4: 00000000001606e0
[ 2229.431925] Call Trace:
[ 2229.436834] ? __switch_to_asm+0x34/0x70
[ 2229.441867] nvme_async_event_work+0x5d/0xc0 [nvme_core]
[ 2229.447057] process_one_work+0x1a7/0x370
[ 2229.452314] worker_thread+0x30/0x380
[ 2229.457634] ? max_active_store+0x80/0x80
[ 2229.463033] kthread+0x112/0x130
[ 2229.468482] ? __kthread_parkme+0x70/0x70
[ 2229.474031] ret_from_fork+0x35/0x40
nvme_enable_aen should not queue async_event_work when set aync_event
feature timeout. Based on the patch: set the flag:NVME_REQ_CANCELLED
for NVME_SC_HOST_ABORTED_CMD and NVME_SC_HOST_PATH_ERROR, check ruturn
value, if less than 0, do not queue async_event_work and return error.
Signed-off-by: Chao Leng <lengchao at huawei.com>
---
drivers/nvme/host/core.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 74f76aa78b02..f4c347fe925a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1422,21 +1422,25 @@ EXPORT_SYMBOL_GPL(nvme_set_queue_count);
(NVME_AEN_CFG_NS_ATTR | NVME_AEN_CFG_FW_ACT | \
NVME_AEN_CFG_ANA_CHANGE | NVME_AEN_CFG_DISC_CHANGE)
-static void nvme_enable_aen(struct nvme_ctrl *ctrl)
+static int nvme_enable_aen(struct nvme_ctrl *ctrl)
{
u32 result, supported_aens = ctrl->oaes & NVME_AEN_SUPPORTED;
int status;
if (!supported_aens)
- return;
+ return 0;
status = nvme_set_features(ctrl, NVME_FEAT_ASYNC_EVENT, supported_aens,
NULL, 0, &result);
- if (status)
+ if (status) {
dev_warn(ctrl->device, "Failed to configure AEN (cfg %x)\n",
supported_aens);
+ if (status < 0)
+ return status;
+ }
queue_work(nvme_wq, &ctrl->async_event_work);
+ return 0;
}
/*
@@ -4343,12 +4347,14 @@ void nvme_start_ctrl(struct nvme_ctrl *ctrl)
{
nvme_start_keep_alive(ctrl);
- nvme_enable_aen(ctrl);
+ if (nvme_enable_aen(ctrl))
+ goto out;
if (ctrl->queue_count > 1) {
nvme_queue_scan(ctrl);
nvme_start_queues(ctrl);
}
+out:
ctrl->created = true;
}
EXPORT_SYMBOL_GPL(nvme_start_ctrl);
--
2.16.4
More information about the Linux-nvme
mailing list