[PATCH] nvme-pci: enhance timeout kernel log
Jens Axboe
axboe at kernel.dk
Thu Dec 7 12:38:04 PST 2023
On 12/7/23 11:32 AM, Keith Busch wrote:
> From: Keith Busch <kbusch at kernel.org>
>
> Kernel configs don't necessarily have opcode decoding, and some opcodes
> are not even decodable. It is still interesting for debugging SSD issues
> to know what opcode is timing out, what request type it came from, and
> the data size (if applicable).
>
> Signed-off-by: Keith Busch <kbusch at kernel.org>
> ---
> drivers/nvme/host/pci.c | 9 +++++----
> 1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index fad4cccce745c..49771919b01f1 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -1284,6 +1284,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
> struct request *abort_req;
> struct nvme_command cmd = { };
> u32 csts = readl(dev->bar + NVME_REG_CSTS);
> + u8 opcode;
>
> /* If PCI error recovery process is happening, we cannot reset or
> * the recovery mechanism will surely fail.
> @@ -1361,11 +1362,11 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
> cmd.abort.cid = nvme_cid(req);
> cmd.abort.sqid = cpu_to_le16(nvmeq->qid);
>
> + opcode = nvme_req(req)->cmd->common.opcode,
> dev_warn(nvmeq->dev->ctrl.device,
> - "I/O %d (%s) QID %d timeout, aborting\n",
> - req->tag,
> - nvme_get_opcode_str(nvme_req(req)->cmd->common.opcode),
> - nvmeq->qid);
> + "I/O %d (%s %d) QID %d timeout, aborting req_op:%u size:%u\n",
> + req->tag, nvme_get_opcode_str(opcode), opcode, nvmeq->qid,
> + req_op(req), blk_rq_bytes(req));
Your additions look good to me, but I do wish that we'd be equally
verbose on what the other values are. Ala:
I/O tag %d (opcode: %s %d) ...
would be a lot more useful, for example. I find myself having to dig
into the source for a specific kernel sometimes when looking at various
nvme errors, which is really annoying.
--
Jens Axboe
More information about the Linux-nvme
mailing list