[PATCH v4] ufs: core: wlun send SSU timeout recovery
Bart Van Assche
bvanassche at acm.org
Fri Sep 22 09:06:05 PDT 2023
On 9/22/23 02:09, peter.wang at mediatek.com wrote:
> When runtime pm send SSU times out, the SCSI core invokes
> eh_host_reset_handler, which hooks function ufshcd_eh_host_reset_handler
> schedule eh_work and stuck at wait flush_work(&hba->eh_work).
> However, ufshcd_err_handler hangs in wait rpm resume.
> Do link recovery only in this case.
> Below is IO hang stack dump in kernel-6.1
What does kernel-6.1 mean? Has commit 7029e2151a7c ("scsi: ufs: Fix a
deadlock between PM and the SCSI error handler") been backported to
that kernel?
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index c2df07545f96..7608d75bb4fe 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -7713,9 +7713,29 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
> int err = SUCCESS;
> unsigned long flags;
> struct ufs_hba *hba;
> + struct device *dev;
>
> hba = shost_priv(cmd->device->host);
>
> + /*
> + * If runtime pm send SSU and got timeout, scsi_error_handler
> + * stuck at this function to wait flush_work(&hba->eh_work).
> + * And ufshcd_err_handler(eh_work) stuck at wait runtime pm active.
> + * Do ufshcd_link_recovery instead shedule eh_work can prevent
> + * dead lock happen.
> + */
> + dev = &hba->ufs_device_wlun->sdev_gendev;
> + if ((dev->power.runtime_status == RPM_RESUMING) ||
> + (dev->power.runtime_status == RPM_SUSPENDING)) {
> + err = ufshcd_link_recovery(hba);
> + if (err) {
> + dev_err(hba->dev, "WL Device PM: status:%d, err:%d\n",
> + dev->power.runtime_status,
> + dev->power.runtime_error);
> + }
> + return err;
> + }
> +
> spin_lock_irqsave(hba->host->host_lock, flags);
> hba->force_reset = true;
> ufshcd_schedule_eh_work(hba);
I think this change is racy because the runtime power management status
may change after the above checks have been performed and before
ufshcd_err_handling_prepare() is called. If commit 7029e2151a7c is
included in your kernel, does applying the untested patch below help?
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index d45a7dd80ab8..656dabea678e 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -123,8 +123,7 @@ static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd, unsigned long msecs)
}
blk_mq_requeue_request(rq, false);
- if (!scsi_host_in_recovery(cmd->device->host))
- blk_mq_delay_kick_requeue_list(rq->q, msecs);
+ blk_mq_delay_kick_requeue_list(rq->q, msecs);
}
/**
@@ -163,8 +162,7 @@ static void __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, bool unbusy)
*/
cmd->result = 0;
- blk_mq_requeue_request(scsi_cmd_to_rq(cmd),
- !scsi_host_in_recovery(cmd->device->host));
+ blk_mq_requeue_request(scsi_cmd_to_rq(cmd), true);
}
/**
@@ -495,9 +493,6 @@ static void scsi_mq_uninit_cmd(struct scsi_cmnd *cmd)
static void scsi_run_queue_async(struct scsi_device *sdev)
{
- if (scsi_host_in_recovery(sdev->host))
- return;
-
if (scsi_target(sdev)->single_lun ||
!list_empty(&sdev->host->starved_list)) {
kblockd_schedule_work(&sdev->requeue_work);
Thanks,
Bart.
More information about the Linux-mediatek
mailing list