[PATCH v4 2/2] ufs: core: requeue aborted request

Bart Van Assche bvanassche at acm.org
Thu Sep 19 11:49:35 PDT 2024


On 9/19/24 5:16 AM, Peter Wang (王信友) wrote:
> The four case flows for abort are as follows:
> ----------------------------------------------------------------
> 
> Case1: DBR ufshcd_abort

Please follow the terminology from the UFSHCI 4.0 standard and use the
word "legacy" instead of "DBR".

> In this case, you can see that ufshcd_release_scsi_cmd will
> definitely be called.
> 
> ufshcd_abort()
>    ufshcd_try_to_abort_task()		// It should trigger an
> interrupt, but the tensor might not
>    get outstanding_lock
>    clear outstanding_reqs tag
>    ufshcd_release_scsi_cmd()
>    release outstanding_lock
> 
> ufshcd_intr()
>    ufshcd_sl_intr()
>      ufshcd_transfer_req_compl()
>        ufshcd_poll()
>          get outstanding_lock
>          clear outstanding_reqs tag
>          release outstanding_lock			
>          __ufshcd_transfer_req_compl()
>            ufshcd_compl_one_cqe()
>            cmd->result = DID_REQUEUE	// mediatek may need quirk
> change DID_ABORT to DID_REQUEUE
>            ufshcd_release_scsi_cmd()
>            scsi_done();
> 
> In most cases, ufshcd_intr will not reach scsi_done because the
> outstanding_reqs tag is cleared by the original thread.
> Therefore, whether there is an interrupt or not doesn't affect
> the result because the ISR will do nothing in most cases.
> 
> In a very low chance, the ISR will reach scsi_done and notify
> SCSI to requeue, and the original thread will not
> call ufshcd_release_scsi_cmd.
> MediaTek may need to change DID_ABORT to DID_REQUEUE in this
> situation, or perhaps not handle this ISR at all.

Please modify ufshcd_compl_one_cqe() such that it ignores commands
with status OCS_ABORTED. This will make the UFSHCI driver behave in
the same way for all UFSHCI controllers, whether or not clearing a
command triggers a completion interrupt.

> ----------------------------------------------------------------
> 
> Case2: MCQ ufshcd_abort
> 
> In the case of MCQ ufshcd_abort, you can also see that
> ufshcd_release_scsi_cmd will definitely be called too.
> However, there seems to be a problem here, as
> ufshcd_release_scsi_cmd might be called twice.
> This is because cmd is not null in ufshcd_release_scsi_cmd,
> which the previous version would set cmd to null.
> Skipping OCS: ABORTED in ufshcd_compl_one_cqe indeed
> can avoid this problem. This part needs further
> consideration on how to handle it.
> 
> ufshcd_abort()
>    ufshcd_mcq_abort()
>      ufshcd_try_to_abort_task()	// will trigger ISR
>      ufshcd_release_scsi_cmd()
> 
> ufs_mtk_mcq_intr()
>    ufshcd_mcq_poll_cqe_lock()
>      ufshcd_mcq_process_cqe()
>        ufshcd_compl_one_cqe()
>          cmd->result = DID_ABORT
>          ufshcd_release_scsi_cmd() // will release twice
>          scsi_done()

Do you agree that this case can be addressed with the
ufshcd_compl_one_cqe() change proposed above?

> ----------------------------------------------------------------
> 
> Case3: DBR ufshcd_err_handler
> 
> In the case of the DBR mode error handler, it's the same;
> ufshcd_release_scsi_cmd will also be executed, and scsi_done
> will definitely be used to notify SCSI to requeue.
> 
> ufshcd_err_handler()
>    ufshcd_abort_all()
>      ufshcd_abort_one()
>        ufshcd_try_to_abort_task()	// It should trigger an
> interrupt, but the tensor might not
>      ufshcd_complete_requests()
>        ufshcd_transfer_req_compl()
>          ufshcd_poll()
>            get outstanding_lock
>            clear outstanding_reqs tag
>            release outstanding_lock	
>            __ufshcd_transfer_req_compl()
>              ufshcd_compl_one_cqe()
>                cmd->result = DID_REQUEUE // mediatek may need quirk
> change DID_ABORT to DID_REQUEUE
>                ufshcd_release_scsi_cmd()
>                scsi_done()
> 
> ufshcd_intr()
>    ufshcd_sl_intr()
>      ufshcd_transfer_req_compl()
>        ufshcd_poll()
>          get outstanding_lock
>          clear outstanding_reqs tag
>          release outstanding_lock			
>          __ufshcd_transfer_req_compl()
>            ufshcd_compl_one_cqe()
>            cmd->result = DID_REQUEUE // mediatek may need quirk change
> DID_ABORT to DID_REQUEUE
>            ufshcd_release_scsi_cmd()
>            scsi_done();
> 
> At this time, the same actions are taken regardless of whether
> there is an ISR, and with the protection of outstanding_lock,
> only one thread will execute ufshcd_release_scsi_cmd and scsi_done.
> ----------------------------------------------------------------
> 
> Case4: MCQ ufshcd_err_handler
> 
> It's the same with MCQ mode; there is protection from the cqe lock,
> so only one thread will execute. What my patch 2 aims to do is to
> change DID_ABORT to DID_REQUEUE in this situation.
> 
> ufshcd_err_handler()
>    ufshcd_abort_all()
>      ufshcd_abort_one()
>        ufshcd_try_to_abort_task()	// will trigger irq thread
>      ufshcd_complete_requests()
>        ufshcd_mcq_compl_pending_transfer()
>          ufshcd_mcq_poll_cqe_lock()
>            ufshcd_mcq_process_cqe()
>              ufshcd_compl_one_cqe()
>                cmd->result = DID_ABORT // should change to DID_REQUEUE
>                ufshcd_release_scsi_cmd()
>                scsi_done()
> 
> ufs_mtk_mcq_intr()
>    ufshcd_mcq_poll_cqe_lock()
>      ufshcd_mcq_process_cqe()
>        ufshcd_compl_one_cqe()
>          cmd->result = DID_ABORT  // should change to DID_REQUEUE
>          ufshcd_release_scsi_cmd()
>          scsi_done()

For legacy and MCQ mode, I prefer the following behavior for
ufshcd_abort_all():
* ufshcd_compl_one_cqe() ignores commands with status OCS_ABORTED.
* ufshcd_release_scsi_cmd() is called either by ufshcd_abort_one() or
   by ufshcd_abort_all().

Do you agree with making the changes proposed above?

Thank you,

Bart.



More information about the Linux-mediatek mailing list