[PATCH v2] ufs: core: fix ufshcd_abort_all racing issue

Wenchao Hao haowenchao22 at gmail.com
Thu Jun 27 00:59:00 PDT 2024


On 2024/6/26 11:56, Peter Wang (王信友) wrote:
> On Tue, 2024-06-25 at 09:42 -0700, Bart Van Assche wrote:
>>
>>
>> Please include a full root cause analysis when reposting fixes for
>> the
>> reported crashes. It is not clear to me how it is possible that an
>> invalid pointer is passed to blk_mq_unique_tag() (0x194). As I
>> mentioned
>> in my previous email, freeing a request does not modify the request
>> pointer and does not modify the SCSI command pointer either. As one
>> can
>> derive from the blk_mq_alloc_rqs() call stack, memory for struct
>> request
>> and struct scsi_cmnd is allocated at request queue allocation time
>> and
>> is not freed until the request queue is freed. Hence, for a given
>> tag,
>> neither the request pointer nor the SCSI command pointer changes as
>> long
>> as a request queue exists. Hence my request for an explanation how it
>> is
>> possible that an invalid pointer was passed to blk_mq_unique_tag().
>>
>> Thanks,
>>
>> Bart.
>>
> 
> Hi Bart,
> 
> Sorry I have not explain root-cause clearly.
> I will add more clear root-cause analyze next version.
> 
> And it is not an invalid pointer is passed to blk_mq_unique_tag(),
> I means blk_mq_unique_tag function try access null pointer.
> It is differnt and cause misunderstanding.
> 
> The null pinter blk_mq_unique_tag try access is:
> rq->mq_hctx(NULL)->queue_num.
> 

Hi Peter, 

What is queue_num's offset of blk_mq_hw_ctx in your machine?

gdb vmlinux

(gdb) print /x (int)&((struct blk_mq_hw_ctx *)0)->queue_num
$5 = 0x164

I read your descriptions and wondered a same race flow as you described
following. But I found the offset mismatch, if the racing flow is correct,
then the address accessed in blk_mq_unique_tag() should be 0x164, not 0x194.
Maybe the offset is different between our machine?

What's more, if the racing flow is correct, I did not get how your changes
can address this racing flow.

> The racing flow is:
> 
> Thread A
> ufshcd_err_handler					step 1
> 	ufshcd_cmd_inflight(true)			step 3
> 	ufshcd_mcq_req_to_hwq
> 		blk_mq_unique_tag
> 			rq->mq_hctx->queue_num		step 5
> 
> Thread B				
> ufs_mtk_mcq_intr(cq complete ISR)			step 2
> 	scsi_done						
> 		...
> 		__blk_mq_free_request
> 			rq->mq_hctx = NULL;		step 4
> 
> Thanks.
> Peter
> 
> 
> 
> 




More information about the Linux-mediatek mailing list