[PATCH v1 01/10] ufs: host: mediatek: Fix runtime suspend error deadlock

Bart Van Assche bvanassche at acm.org
Fri Sep 19 13:57:18 PDT 2025


On 9/19/25 1:11 AM, Peter Wang (王信友) wrote:
> An error occurred during the suspend process, causing IO to hang.
> This is because the error handler (eh) work is waiting for
> resume, while the suspend work is waiting for the error handler
> to finish before sending SSU.

If the suspend callback waits for error handling to finish and the
error handler waits until resuming has finished, isn't this an issue
that can occur for any UFS host controller and hence that should be
fixed in the UFSHCI driver core rather than in one host driver only?

Why is the hba->pm_op_in_progress variable not sufficient to prevent
this deadlock? Should this code perhaps be moved from
ufshcd_eh_host_reset_handler() into ufshcd_err_handler()?

	/*
	 * If runtime PM sent SSU and got a timeout, scsi_error_handler is
	 * stuck in this function waiting for flush_work(&hba->eh_work). And
	 * ufshcd_err_handler(eh_work) is stuck waiting for runtime PM. Do
	 * ufshcd_link_recovery instead of eh_work to prevent deadlock.
	 */
	if (hba->pm_op_in_progress) {
		if (ufshcd_link_recovery(hba))
			err = FAILED;

		return err;
	}

>> How can ufs_mtk_suspend() be called while the error handler is in
>> progress? ufshcd_err_handler() disables RPM before it sets the
>> UFSHCD_EH_IN_PROGRESS flag.
> 
> This error is triggered by ufs_mtk_auto_hibern8_disable,
> As the comment description
> /* May trigger EH work without exiting hibern8 error */
> so it could happen during the suspend period.

That source code comment is confusing me, especially the "without
exiting hibern8 error" part. Do you really want to say that the device
is in a hibernation error state and remains in a hibernation error
state?

>> The UFSHCD_EH_IN_PROGRESS definition and also the
>> ufshcd_set_eh_in_progress() and ufshcd_clear_eh_in_progress()
>> definitions must remain in the UFS core private code. Please do not
>> move
>> these definitions into the include/ufs/ufshcd.h header file.
> 
> Do you think we should check ufshcd_eh_in_progress in
> __ufshcd_wl_suspend? I'm not sure, because we don't see this
> error on all UFS hosts — the vendor suspend operations
> (ufshcd_vops_suspend) could be different.

Why is auto-hibernation disabled during suspend? As far as I know the
UFSHCI standard allows to keep auto-hibernation enabled during suspend.

Thanks,

Bart.



More information about the Linux-mediatek mailing list