[PATCH v2 1/1] nvme: Convert tag_list mutex to rwsemaphore to avoid deadlock

Mohamed Khalfella mkhalfella at purestorage.com
Mon Nov 17 18:07:31 PST 2025


On Tue 2025-11-18 09:34:41 +0800, Hillf Danton wrote:
> On Mon, 17 Nov 2025 12:23:53 -0800 Mohamed Khalfella wrote:
> >  static void blk_mq_del_queue_tag_set(struct request_queue *q)
> >  {
> >  	struct blk_mq_tag_set *set = q->tag_set;
> > +	struct request_queue *firstq;
> > +	unsigned int memflags;
> >  
> > -	mutex_lock(&set->tag_list_lock);
> > +	down_write(&set->tag_list_rwsem);
> >  	list_del(&q->tag_set_list);
> > -	if (list_is_singular(&set->tag_list)) {
> > -		/* just transitioned to unshared */
> > -		set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
> > -		/* update existing queue */
> > -		blk_mq_update_tag_set_shared(set, false);
> > +	if (!list_is_singular(&set->tag_list)) {
> > +		up_write(&set->tag_list_rwsem);
> > +		goto out;
> >  	}
> > -	mutex_unlock(&set->tag_list_lock);
> > +
> > +	/*
> > +	 * Transitioning the remaining firstq to unshared.
> > +	 * Also, downgrade the semaphore to avoid deadlock
> > +	 * with blk_mq_quiesce_tagset() while waiting for
> > +	 * firstq to be frozen.
> > +	 */
> > +	set->flags &= ~BLK_MQ_F_TAG_QUEUE_SHARED;
> > +	downgrade_write(&set->tag_list_rwsem);
> 
> If the first lock waiter is for write, it could ruin your downgrade trick.

How is that possible? If the first waiter or the only waiter is for
write then they should not take the semaphore because it has not been
fully released yet, right?




More information about the Linux-nvme mailing list