[PATCH] nvme: replace meaningless judgement by checking whether req is null
Tianxianting
tian.xianting at h3c.com
Mon Sep 21 11:49:09 EDT 2020
HI Keith,
Thanks for your comments,
I will submit a new patch of version 2 for the further reviewing, v2 patch will contains:
1, retain existing judgement and dev_warn;
2, add the check whether req is null(already did in this patch)
3, simplify and make the changelog succinct according to you said " This is what I'm thinking:".
Is it right?
Should I retain the nvme_irq crash log in changelog, mention the difference between nvmeq->q_depth and tagset queue_depth?
Thanks
-----Original Message-----
From: Keith Busch [mailto:kbusch at kernel.org]
Sent: Monday, September 21, 2020 11:08 PM
To: tianxianting (RD) <tian.xianting at h3c.com>
Cc: axboe at fb.com; hch at lst.de; sagi at grimberg.me; linux-nvme at lists.infradead.org; linux-kernel at vger.kernel.org
Subject: Re: [PATCH] nvme: replace meaningless judgement by checking whether req is null
On Mon, Sep 21, 2020 at 10:10:52AM +0800, Xianting Tian wrote:
> @@ -940,13 +940,6 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
> struct nvme_completion *cqe = &nvmeq->cqes[idx];
> struct request *req;
>
> - if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
> - dev_warn(nvmeq->dev->ctrl.device,
> - "invalid id %d completed on queue %d\n",
> - cqe->command_id, le16_to_cpu(cqe->sq_id));
> - return;
> - }
> -
> /*
> * AEN requests are special as they don't time out and can
> * survive any kind of queue freeze and often don't respond to @@
> -960,6 +953,13 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
> }
>
> req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), cqe->command_id);
> + if (unlikely(!req)) {
> + dev_warn(nvmeq->dev->ctrl.device,
> + "req is null for tag %d completed on queue %d\n",
> + cqe->command_id, le16_to_cpu(cqe->sq_id));
> + return;
> + }
This is making sense now, though I think we should retain the existing
dev_warn() since it's still accurate and provides continuity for people who are used to looking for these sorts of messages.
Your changelog is a bit much though. I think we can say it a bit more succinctly. This is what I'm thinking:
The driver registers interrupts for queues before initializing the
tagset because it uses the number of successful request_irq() calls
to configure the tagset parameters. This allows a race condition with
the current tag validity check if the controller happens to produce
an interrupt with a corrupted CQE before the tagset is initialized.
Replace the driver's indirect tag check with the one already provided
by the block layer.
More information about the Linux-nvme
mailing list