[PATCH] nvme: introduce panic_on_double_cqe param
Guixin Liu
kanie at linux.alibaba.com
Tue Oct 28 18:42:11 PDT 2025
在 2025/10/23 13:14, Chaitanya Kulkarni 写道:
> On 10/22/25 6:54 AM, Guixin Liu wrote:
>> Add a new debug switch to control whether to trigger a kernel crash
>> when duplicate CQEs are detected, in order to preserve the kernel
>> context, such as sq, cq, and so on, for subsequent debugging and
>> analysis.
>>
>> Signed-off-by: Guixin Liu <kanie at linux.alibaba.com>
>> ---
>> drivers/nvme/host/core.c | 5 +++++
>> drivers/nvme/host/nvme.h | 3 +++
>> 2 files changed, 8 insertions(+)
>>
>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>> index fa4181d7de73..7a3f9129a39c 100644
>> --- a/drivers/nvme/host/core.c
>> +++ b/drivers/nvme/host/core.c
>> @@ -95,6 +95,11 @@ module_param(apst_secondary_latency_tol_us, ulong, 0644);
>> MODULE_PARM_DESC(apst_secondary_latency_tol_us,
>> "secondary APST latency tolerance in us");
>>
>> +bool panic_on_double_cqe;
>> +EXPORT_SYMBOL_GPL(panic_on_double_cqe);
>> +module_param(panic_on_double_cqe, bool, 0644);
>> +MODULE_PARM_DESC(panic_on_double_cqe, "crash the kernel to save the scene");
>> +
>> /*
>> * Older kernels didn't enable protection information if it was at an offset.
>> * Newer kernels do, so it breaks reads on the upgrade if such formats were
>> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
>> index 102fae6a231c..24010d5d15ce 100644
>> --- a/drivers/nvme/host/nvme.h
>> +++ b/drivers/nvme/host/nvme.h
>> @@ -595,6 +595,8 @@ static inline u16 nvme_cid(struct request *rq)
>> return nvme_cid_install_genctr(nvme_req(rq)->genctr) | rq->tag;
>> }
>>
>> +extern bool panic_on_double_cqe;
>> +
>> static inline struct request *nvme_find_rq(struct blk_mq_tags *tags,
>> u16 command_id)
>> {
>> @@ -612,6 +614,7 @@ static inline struct request *nvme_find_rq(struct blk_mq_tags *tags,
>> dev_err(nvme_req(rq)->ctrl->device,
>> "request %#x genctr mismatch (got %#x expected %#x)\n",
>> tag, genctr, nvme_genctr_mask(nvme_req(rq)->genctr));
>> + BUG_ON(panic_on_double_cqe);
>> return NULL;
>> }
>> return rq;
>
> I'm really not sure this is a good idea, I'll leave to others.
>
>
> -ck
Yeah, I think so too, and I'd also like to find a more elegant solution.
Best Regards,
Guixin Liu
More information about the Linux-nvme
mailing list