[PATCH 0/2] Handle update hardware queues and queue freeze more carefully

Fri Jun 25 06:00:25 PDT 2021

On Fri, Jun 25, 2021 at 02:21:56PM +0200, Daniel Wagner wrote:
> On Fri, Jun 25, 2021 at 12:16:47PM +0200, Daniel Wagner wrote:
> > this is a followup on the crash I reported in
> > 
> >   https://lore.kernel.org/linux-block/20210608183339.70609-1-dwagner@suse.de/
> > 
> > By moving the hardware check up the crash was gone. Unfortuntatly, I
> > don't understand why this fixes the crash. The per-cpu access is
> > crashing but I can't see why the blk_mq_update_nr_hw_queues() is
> > fixing this problem.
> > 
> > Even though I can't explain why it fixes it, I think it makes sense to
> > update the hardware queue mapping bevore we recreate the IO
> > queues. Thus I avoided in the commit message to say it fixes
> > something.
> 
> I just discussed this with Hannes and we figured out how the crash is
> fixed by moving the blk_mq_update_nr_hw_queues() before the
> nvme_fc_create_hw_io_queues()/nvme_fc_connect_io_queues().
> 
> First of all, blk_mq_update_nr_hw_queues() operates on the normal
> tag_set and not the admin_tag_set. That means when we move the
> blk_mq_update_nr_hw_queues() before the nvme_fc_connect_io_queues(), we
> update the mapping to only CPUs and hwctx which are available. When we
> then do the connect call nvmf_connect_io_queue() we will only allocate
> tags from queues which are not in the BLK_MQ_S_INACTIVE anymore. Hence
> we skip the blk_mq_put_tag() call.

Your patch just reduces the race window, what if all cpus in
hctx->cpumask become offline when calling blk_mq_alloc_request_hctx()?

Thanks,
Ming