[PATCH] nvme-pci: enhance timeout kernel log

Chaitanya Kulkarni chaitanyak at nvidia.com
Thu Dec 7 12:47:16 PST 2023


On 12/7/2023 12:41 PM, Keith Busch wrote:
> On Thu, Dec 07, 2023 at 01:38:04PM -0700, Jens Axboe wrote:
>> On 12/7/23 11:32 AM, Keith Busch wrote:
>>> From: Keith Busch <kbusch at kernel.org>
>>>
>>> Kernel configs don't necessarily have opcode decoding, and some opcodes
>>> are not even decodable. It is still interesting for debugging SSD issues
>>> to know what opcode is timing out, what request type it came from, and
>>> the data size (if applicable).
>>>
>>> Signed-off-by: Keith Busch <kbusch at kernel.org>
>>> ---
>>>   drivers/nvme/host/pci.c | 9 +++++----
>>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>>> index fad4cccce745c..49771919b01f1 100644
>>> --- a/drivers/nvme/host/pci.c
>>> +++ b/drivers/nvme/host/pci.c
>>> @@ -1284,6 +1284,7 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
>>>   	struct request *abort_req;
>>>   	struct nvme_command cmd = { };
>>>   	u32 csts = readl(dev->bar + NVME_REG_CSTS);
>>> +	u8 opcode;
>>>   
>>>   	/* If PCI error recovery process is happening, we cannot reset or
>>>   	 * the recovery mechanism will surely fail.
>>> @@ -1361,11 +1362,11 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req)
>>>   	cmd.abort.cid = nvme_cid(req);
>>>   	cmd.abort.sqid = cpu_to_le16(nvmeq->qid);
>>>   
>>> +	opcode = nvme_req(req)->cmd->common.opcode,
>>>   	dev_warn(nvmeq->dev->ctrl.device,
>>> -		"I/O %d (%s) QID %d timeout, aborting\n",
>>> -		 req->tag,
>>> -		 nvme_get_opcode_str(nvme_req(req)->cmd->common.opcode),
>>> -		 nvmeq->qid);
>>> +		 "I/O %d (%s %d) QID %d timeout, aborting req_op:%u size:%u\n",
>>> +		 req->tag, nvme_get_opcode_str(opcode), opcode, nvmeq->qid,
>>> +		 req_op(req), blk_rq_bytes(req));
>>
>> Your additions look good to me, but I do wish that we'd be equally
>> verbose on what the other values are. Ala:
>>
>> 	I/O tag %d (opcode: %s %d) ...
>>
>> would be a lot more useful, for example. I find myself having to dig
>> into the source for a specific kernel sometimes when looking at various
>> nvme errors, which is really annoying.
> 
> That is annoying. I'll add your suggestion in.
> 

while doing that is it worth printing the blk_op_str(req_op(req)) along
with the req_op(req) since we are doing verbose anyway ? if it is not
worth disregard my suggestion ...

-ck




More information about the Linux-nvme mailing list