[PATCH RFC v2 1/8] bus: mhi: host: add mhi_power_down_no_destroy()

Baochen Qiang quic_bqiang at quicinc.com
Mon Jan 22 00:09:53 PST 2024



On 1/22/2024 2:24 PM, Manivannan Sadhasivam wrote:
> On Thu, Jan 04, 2024 at 11:39:12AM +0530, Manivannan Sadhasivam wrote:
> 
> + Can, Qiang
> 
> [...]
> 
>>>> To me it all sounds like the probe deferral is not handled properly in mac80211
>>>> stack. As you mentioned in the commit message that the dpm_prepare() blocks
>>>> probing of devices. It gets unblocked and trigerred in dpm_complete():
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/power/main.c#n1131
>>>>
>>>> So if mac80211/ath11k cannot probe the devices at the dpm_complete() stage, then
>>>> it is definitely an issue that needs to be fixed properly.
>>> To clarify, ath11k CAN probe the devices at dpm_complete() stage. The
>>> problem is kernel does not wait for all probes to finish, and in that way we
>>> will face the issue that user space applications are likely to fail because
>>> they get thawed BEFORE WLAN is ready.
>>>
>>
>> Hmm. Please give me some time to reproduce this issue locally. I will get back
>> to this thread with my analysis.
>>
> 
> We reproduced the issue with the help of PCIe team (thanks Can). What we found
> out was, during the resume from hibernation the faliure happens in
> ath11k_core_resume(). Precisely here:
> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/tree/drivers/net/wireless/ath/ath11k/core.c?h=ath11k-hibernation-support#n850
> 
> This code waits for the QMI messages to arrive and eventually timesout. But the
> impression I got from the start was that the mhi_power_up() always fails during
> resume. In our investigation, we confirmed that the failure is not happening at
> the MHI level.No, mhi_power_up() never fails as it only downloads PBL, SBL and waits 
for mission mode, no MHI device created hence not affected by the 
deferred probe. However in addition to PBL/SBL, ath11k also needs to 
download m3.bin, borad.bin and regdb.bin. Those files are part of WLAN 
firmware and are downloaded via QMI messages. After mhi_power_up() 
succeeds ath11k_core_resume() waits for QMI downloading those files. As 
you know QMI relies on MHI channels, these channels are managed by 
qcom_mhi_qrtr_driver. Since device probing is deferred, 
qcom_mhi_qrtr_driver has no chance to run at this stage. As a result 
ath11k_core_resume() times out.

> 
> I'm not pointing fingers here, but trying to understand why can't you fix
> ath11k_core_resume() to not timeout? IMO this timeout should be handled as a
> deferral case.
Let's see what happens if we do it in a deferral way:
1. In ath11k_core_resume() we returns success directly without waiting 
for QMI downloading other firmware files.
2. Kernel unblocks device probe and schedules a work item to trigger all 
deferred probing. As a result MHI devices are probed by 
qcom_mhi_qrtr_driver and finally QMI is online.
3. kernel continues to resume and wake up userspace applications.
4. ath11k gets the message, either by kernel PM notification or 
something else, that QMI is ready and then downloads other firmware files.

What happens if userspace applications or network stack immediately 
initiate some WLAN request after resume back? Can ath11k handle such 
request? The answer is, most likely, no. Because there is no guarantee 
that QMI finishes downloading before those request.

> 
> - Mani
> 



More information about the ath11k mailing list