[PATCH v3] ufs: core: wlun resume SSU(Acitve) fail recovery

Bart Van Assche bvanassche at acm.org
Mon Jan 2 16:29:26 PST 2023


On 12/27/22 22:01, peter.wang at mediatek.com wrote:
> When wlun resume SSU(Active) timeout, scsi try eh_host_reset_handler.
                    ^^^^^^^^^^^
Please use the same spelling in the patch subject (Acitve -> Active).

timeout -> times out
scsi try -> the SCSI core invokes

> But ufshcd_eh_host_reset_handler hang at wait flush_work(&hba->eh_work).

hang at -> hangs in

> And ufshcd_err_handler hang at wait rpm resume.

hang at wait rpm resume -> hangs in rpm_resume().

 > <ffffffdd78e02b34> schedule+0x110/0x204
 > <ffffffdd78e0be60> schedule_timeout+0x98/0x138
 > <ffffffdd78e040e8> wait_for_common_io+0x130/0x2d0
 > <ffffffdd77d6a000> blk_execute_rq+0x10c/0x16c
 > <ffffffdd78126d90> __scsi_execute+0xfc/0x278
 > <ffffffdd7813891c> ufshcd_set_dev_pwr_mode+0x1c8/0x40c
 > <ffffffdd78137d1c> __ufshcd_wl_resume+0xf0/0x5cc
 > <ffffffdd78137ae0> ufshcd_wl_runtime_resume+0x40/0x18c
 > <ffffffdd78136108> scsi_runtime_resume+0x88/0x104
 > <ffffffdd7809a4f8> __rpm_callback+0x1a0/0xaec
 > <ffffffdd7809b624> rpm_resume+0x7e0/0xcd0
 > <ffffffdd7809a788> __rpm_callback+0x430/0xaec
 > <ffffffdd7809b644> rpm_resume+0x800/0xcd0
 > <ffffffdd780a0778> pm_runtime_work+0x148/0x198
 >
 > <ffffffdd78e02b34> schedule+0x110/0x204
 > <ffffffdd78e0be10> schedule_timeout+0x48/0x138
 > <ffffffdd78e03d9c> wait_for_common+0x144/0x2dc
 > <ffffffdd7758bba4> __flush_work+0x3d0/0x508
 > <ffffffdd7815572c> ufshcd_eh_host_reset_handler+0x134/0x3a8
 > <ffffffdd781216f4> scsi_try_host_reset+0x54/0x204
 > <ffffffdd78120594> scsi_eh_ready_devs+0xb30/0xd48
 > <ffffffdd7812373c> scsi_error_handler+0x260/0x874
 >
 > <ffffffdd78e02b34> schedule+0x110/0x204
 > <ffffffdd7809af64> rpm_resume+0x120/0xcd0
 > <ffffffdd7809fde8> __pm_runtime_resume+0xa0/0x17c
 > <ffffffdd7815193c> ufshcd_err_handling_prepare+0x40/0x430
 > <ffffffdd7814cce8> ufshcd_err_handler+0x1c4/0xd4c

On top of which kernel version has this patch been developed?
I think this deadlock has already been fixed by commit 7029e2151a7c 
("scsi: ufs: Fix a deadlock between PM and the SCSI error handler").
Please check whether that commit by itself (without this patch) is 
sufficient to fix the reported deadlock.

> ---
>   drivers/ufs/core/ufshcd.c | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)

The changelog is missing. Please include a changelog when posting v2 or 
a later version of a patch.

> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index e18c9f4463ec..0dfb9a35bf66 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -7366,6 +7366,23 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
>   
>   	hba = shost_priv(cmd->device->host);
>   
> +	/*
> +	 * If pm op resume fail and wait err recovery, do link recovery only.
> +	 * Because schedule eh work will get dead lock in ufshcd_rpm_get_sync
> +	 * and wait wlun resume, but wlun resume error wait eh work finish.
> +	 */

The above comment has grammar issues and some parts are 
incomprehensible. What does e.g. "wait err recovery" mean? Please 
improve this source code comment.

Thanks,

Bart.



More information about the Linux-mediatek mailing list