[PATCH] nvme-pci: check for valid request when polling for completions

Mon Sep 2 23:25:08 PDT 2024

On 9/2/24 19:04, Sagi Grimberg wrote:
> 
> 
> 
> On 02/09/2024 16:07, Hannes Reinecke wrote:
>> When polling for completions from the timeout handler we traverse
>> over _all_ cqes, and the fetching the request via blk_mq_tag_to_rq().
>> Unfortunately that function will always return a request, even if
>> that request is already completed.
>> So we need to check if the command is still in flight before
>> attempting to complete it.
> 
> Hannes, are you able to trigger a use-after-free here? That is the
> inevitable outcome of completing an already completed request.
> So, why are you not calling it as it is?
> 
Okay, will do.

> Was this seen in practice? do you have a stack trace?
> I ask for this because it makes the review a lot clearer.

Here it is:

crash> bt
PID: 183      TASK: ffff8d3d06150000  CPU: 3    COMMAND: "kworker/3:1H"
  #0 [ffff9e83c0427930] machine_kexec at ffffffff9de840f3
  #1 [ffff9e83c0427988] __crash_kexec at ffffffff9df8736a
  #2 [ffff9e83c0427a48] crash_kexec at ffffffff9df88264
  #3 [ffff9e83c0427a58] oops_end at ffffffff9de42068
  #4 [ffff9e83c0427a78] do_trap at ffffffff9de3e53a
  #5 [ffff9e83c0427ac0] do_error_trap at ffffffff9de3e744
  #6 [ffff9e83c0427b00] exc_invalid_op at ffffffff9e869bb3
  #7 [ffff9e83c0427b20] asm_exc_invalid_op at ffffffff9ea00e4d
     [exception RIP: __slab_free+698]
     RIP: ffffffff9e124e2a  RSP: ffff9e83c0427bd0  RFLAGS: 00010246
     RAX: ffff8d3f51141800  RBX: 0000000000080006  RCX: ffff8d3f51141000
     RDX: ffff8d3f51141000  RSI: fffffb0f4d445000  RDI: ffff8d3d00042d00
     RBP: ffff9e83c0427c80   R8: 0000000000000001   R9: ffffffffc04ff496
     R10: 0000000000000000  R11: ffff8d3f51141000  R12: fffffb0f4d445000
     R13: ffff8d3f51141000  R14: ffff8d3d00042d00  R15: ffff8d3f51141000
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  #8 [ffff9e83c0427bf8] dev_printk_emit at ffffffff9e867920
  #9 [ffff9e83c0427c88] nvme_pci_complete_rq at ffffffffc04ff496 [nvme]
#10 [ffff9e83c0427cb0] nvme_poll_irqdisable at ffffffffc0501496 [nvme]
#11 [ffff9e83c0427ce8] nvme_timeout at ffffffffc0503306 [nvme]
#12 [ffff9e83c0427d68] blk_mq_check_expired at ffffffff9e2faa8a
#13 [ffff9e83c0427d78] bt_iter at ffffffff9e301349
#14 [ffff9e83c0427db0] blk_mq_queue_tag_busy_iter at ffffffff9e301f35
#15 [ffff9e83c0427e60] blk_mq_timeout_work at ffffffff9e2fbe6e
#16 [ffff9e83c0427e98] process_one_work at ffffffff9ded1f64
#17 [ffff9e83c0427ed8] worker_thread at ffffffff9ded216d
#18 [ffff9e83c0427f10] kthread at ffffffff9ded9b54

The panic was triggered at:

# mm/slub.c
  374 static inline void set_freepointer(struct kmem_cache *s, void 
*object, void *fp)
  375 { 

  376         unsigned long freeptr_addr = (unsigned long)object + 
s->offset;
  377 

  378 #ifdef CONFIG_SLAB_FREELIST_HARDENED 

  379         BUG_ON(object == fp); /* naive detection of double free or 
corruption */
  380 #endif

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare at suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich