[PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress

Tue Sep 20 11:25:04 PDT 2022

On 9/19/22 19:00, Peter Wang wrote:
> 
> On 9/20/22 00:25, Bart Van Assche wrote:
>> On 9/19/22 07:47, Peter Wang wrote:
>>> If the scsi error happened and need do ufshcd_eh_host_reset_handler, 
>>> the rpm state should in RPM_ACTIVE.
>>> Because scsi need wakeup suspended LUN, and send command to LUN then 
>>> get error, right?
>>
>> The following sequence may activate the SCSI error handler while the 
>> RPM state is RPM_RESUMING:
>> * The RPM state is RPM_SUSPENDED.
>> * The RPM state is changed into RPM_RESUMING and ufshcd_wl_resume() is 
>> called.
>> * ufshcd_set_dev_pwr_mode() calls scsi_execute() and the START STOP 
>> UNIT command times out.
>> * Because of this timeout the SCSI error handler is activated.
> 
> This case will not get rpm, because pm_op_in_progress is true.
> 
> So it won't hang with ufshcd_rpm_get_sync.

Right, but I think the following scenario will result in a hang:
* The RPM state is changed from RPM_SUSPENDED into RPM_RESUMING and
   ufshcd_wl_resume() has not yet been called.
* ufshcd_eh_host_reset_handler() queues ufshcd_err_handler() and the
   latter function calls ufshcd_rpm_get_sync().
* This results in a deadlock: the scsi_execute() call by
   ufshcd_wl_resume() cannot make progress because the SCSI host state is
   SHOST_RECOVERY and the error handler cannot make progress because it
   keeps waiting until ufshcd_rpm_get_sync() has finished.

Thanks,

Bart.