[PATCH v1] ufs: core: fix ufshcd_abort_all racing issue

Peter Wang (王信友) peter.wang at mediatek.com
Mon Jun 24 02:13:42 PDT 2024


On Fri, 2024-06-21 at 10:15 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 6/21/24 4:30 AM, peter.wang at mediatek.com wrote:
>  > [ ... ]
> 
> Fixes: and Cc: stable tags are missing from this patch. Please add
> these.
> 

Hi Bart,

Okay, will add next version. Thanks.


> > diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-
> mcq.c
> > index 8944548c30fa..3b2e5bcb08a7 100644
> > --- a/drivers/ufs/core/ufs-mcq.c
> > +++ b/drivers/ufs/core/ufs-mcq.c
> > @@ -512,8 +512,9 @@ int ufshcd_mcq_sq_cleanup(struct ufs_hba *hba,
> int task_tag)
> >   return -ETIMEDOUT;
> >   
> >   if (task_tag != hba->nutrs - UFSHCD_NUM_RESERVED) {
> > -if (!cmd)
> > -return -EINVAL;
> > +/* Should return 0 if cmd is already complete by irq */
> > +if (!cmd || !ufshcd_cmd_inflight(cmd))
> > +return 0;
> >   hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd));
> >   } else {
> >   hwq = hba->dev_cmd_queue;
> 
> This code change reduces the race window but does not eliminate it
> since
> no locks are held around this code path.
> 
> Additionally, are you sure that this change is necessary? I don't
> think
> that ufshcd_err_handler() calls ufshcd_mcq_sq_cleanup().
> 
> Thanks,
> 
> Bart.

Hi Bart,

Yes, this code reduces the race window only.
Beacuse if we want get mcq lock, we need know witch hwq[i] lock 
can use.
But before we get which hwq[i] lock could use, we will get this null
pointer error.
This is quite the chicken or the eqq dilemma.
Could you have any suggestion?

Additionally, here is the backtrace of ufshcd_mcq_sq_cleanup.

  ufshcd_try_to_abort_task: cmd pending in the device. tag = 6
  Unable to handle kernel NULL pointer dereference at virtual address
0000000000000194
   pc : [0xffffffd589679bf8] blk_mq_unique_tag+0x8/0x14
   lr : [0xffffffd5862f95b4] ufshcd_mcq_sq_cleanup+0x6c/0x1cc
[ufs_mediatek_mod_ise]
   Workqueue: ufs_eh_wq_0 ufshcd_err_handler [ufs_mediatek_mod_ise]
   Call trace:
    dump_backtrace+0xf8/0x148
    show_stack+0x18/0x24
    dump_stack_lvl+0x60/0x7c
    dump_stack+0x18/0x3c
    mrdump_common_die+0x24c/0x398 [mrdump]
    ipanic_die+0x20/0x34 [mrdump]
    notify_die+0x80/0xd8
    die+0x94/0x2b8
    __do_kernel_fault+0x264/0x298
    do_page_fault+0xa4/0x4b8
    do_translation_fault+0x38/0x54
    do_mem_abort+0x58/0x118
    el1_abort+0x3c/0x5c
    el1h_64_sync_handler+0x54/0x90
    el1h_64_sync+0x68/0x6c
    blk_mq_unique_tag+0x8/0x14
    ufshcd_clear_cmd+0x34/0x118 [ufs_mediatek_mod_ise]
    ufshcd_try_to_abort_task+0x2c8/0x5b4 [ufs_mediatek_mod_ise]
    ufshcd_err_handler+0xa7c/0xfa8 [ufs_mediatek_mod_ise]
    process_one_work+0x208/0x4fc
    worker_thread+0x228/0x438
    kthread+0x104/0x1d4
    ret_from_fork+0x10/0x20

Thanks.
Peter



More information about the Linux-mediatek mailing list