[PATCH RFC v2 1/8] bus: mhi: host: add mhi_power_down_no_destroy()
Baochen Qiang
quic_bqiang at quicinc.com
Mon Jan 22 17:44:11 PST 2024
On 1/22/2024 9:09 PM, Manivannan Sadhasivam wrote:
> On Mon, Jan 22, 2024 at 04:09:53PM +0800, Baochen Qiang wrote:
>>
>>
>> On 1/22/2024 2:24 PM, Manivannan Sadhasivam wrote:
>>> On Thu, Jan 04, 2024 at 11:39:12AM +0530, Manivannan Sadhasivam wrote:
>>>
>>> + Can, Qiang
>>>
>>> [...]
>>>
>>>>>> To me it all sounds like the probe deferral is not handled properly in mac80211
>>>>>> stack. As you mentioned in the commit message that the dpm_prepare() blocks
>>>>>> probing of devices. It gets unblocked and trigerred in dpm_complete():
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/power/main.c#n1131
>>>>>>
>>>>>> So if mac80211/ath11k cannot probe the devices at the dpm_complete() stage, then
>>>>>> it is definitely an issue that needs to be fixed properly.
>>>>> To clarify, ath11k CAN probe the devices at dpm_complete() stage. The
>>>>> problem is kernel does not wait for all probes to finish, and in that way we
>>>>> will face the issue that user space applications are likely to fail because
>>>>> they get thawed BEFORE WLAN is ready.
>>>>>
>>>>
>>>> Hmm. Please give me some time to reproduce this issue locally. I will get back
>>>> to this thread with my analysis.
>>>>
>>>
>>> We reproduced the issue with the help of PCIe team (thanks Can). What we found
>>> out was, during the resume from hibernation the faliure happens in
>>> ath11k_core_resume(). Precisely here:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/tree/drivers/net/wireless/ath/ath11k/core.c?h=ath11k-hibernation-support#n850
>>>
>>> This code waits for the QMI messages to arrive and eventually timesout. But the
>>> impression I got from the start was that the mhi_power_up() always fails during
>>> resume. In our investigation, we confirmed that the failure is not happening at
>>> the MHI level.No, mhi_power_up() never fails as it only downloads PBL,
>>> SBL and waits
>> for mission mode, no MHI device created hence not affected by the deferred
>> probe. However in addition to PBL/SBL, ath11k also needs to download m3.bin,
>> borad.bin and regdb.bin. Those files are part of WLAN firmware and are
>> downloaded via QMI messages. After mhi_power_up() succeeds
>> ath11k_core_resume() waits for QMI downloading those files. As you know QMI
>> relies on MHI channels, these channels are managed by qcom_mhi_qrtr_driver.
>> Since device probing is deferred, qcom_mhi_qrtr_driver has no chance to run
>> at this stage. As a result ath11k_core_resume() times out.
>>
>
> Thanks for the info, this clarifies the issue in detail.
>
>>>
>>> I'm not pointing fingers here, but trying to understand why can't you fix
>>> ath11k_core_resume() to not timeout? IMO this timeout should be handled as a
>>> deferral case.
>> Let's see what happens if we do it in a deferral way:
>> 1. In ath11k_core_resume() we returns success directly without waiting for
>> QMI downloading other firmware files.
>> 2. Kernel unblocks device probe and schedules a work item to trigger all
>> deferred probing. As a result MHI devices are probed by qcom_mhi_qrtr_driver
>> and finally QMI is online.
>> 3. kernel continues to resume and wake up userspace applications.
>> 4. ath11k gets the message, either by kernel PM notification or something
>> else, that QMI is ready and then downloads other firmware files.
>>
>> What happens if userspace applications or network stack immediately initiate
>> some WLAN request after resume back? Can ath11k handle such request? The
>> answer is, most likely, no. Because there is no guarantee that QMI finishes
>> downloading before those request.
>>
>
> What will happen to userspace if ath11k returns an error like -EBUSY or
> something? Will the netdev completely go away?
It depends, and varies from application to application, we can't make
the assumption.
Besides, it doesn't make sense to return -EBUSY or something like that,
if ath11k returns success during resume. A WLAN driver is supposed to
finish everything, at least get back to the state before suspend, in the
resume callback. If it couldn't, report the error.
>
> - Mani
>
>>>
>>> - Mani
>>>
>
More information about the ath11k
mailing list