ath11k resume fails due to kernel blocks probing MHI virtual devices
Rafael J. Wysocki
rafael at kernel.org
Mon Jan 29 04:37:41 PST 2024
On Mon, Jan 29, 2024 at 1:31 PM Manivannan Sadhasivam <mani at kernel.org> wrote:
>
> On Mon, Jan 29, 2024 at 01:22:27PM +0100, Rafael J. Wysocki wrote:
> > On Mon, Jan 29, 2024 at 11:10 AM Baochen Qiang <quic_bqiang at quicinc.com> wrote:
> > >
> > > Hi Rafael and Pavel,
> > >
> > > Currently I am facing an ath11k (a kernel WLAN driver) resume issue
> > > related with kernel PM framework and MHI module.
> > >
> > > Before introducing the issue details, I'd like to summarize how ath11k
> > > interacts with MHI stack to download WLAN firmware to hardware target:
> > > 1. when booting/restarting, ath11k powers on MHI module and waits for
> > > MHI channels to be ready.
> > > 2. When power on, MHI stack creates some virtual MHI devices, which
> > > represents MHI hardware channels, and adds them to MHI bus. This
> > > triggers MHI client driver, named QRTR, to get matched and probe those
> > > MHI devices. In probe, QRTR initializes MHI channels and finally move
> > > them to ready state.
> > > 3. Once MHI channels ready, ath11k downloads WLAN firmware to hardware
> > > target, then WLAN is working.
> > >
> > > Such an flow works well in general, but introduces issues in hibernation
> > > cycle: when preparing for hibernation, ath11k powers down MHI, this
> > > results in MHI devices being destroyed thus QRTR resets MHI channels.
> > > When resuming back from hibernation, ath11k powers on MHI and waits for
> > > MHI channels to be ready in its resume callback. As said above, MHI
> > > creates and adds MHI devices to MHI bus, but they can't be probed at
> > > that time because device probe is prohibited in device_block_probing(),
> > > finally this results in ath11k resume timeout.
> > >
> > > Now there is an potential fix to this issue which would needs changes in
> > > MHI stack, i.e., don't destroy MHI devices while hibernating.
> >
> > Exactly.
> >
>
> During hibernation, the power to ath11k could be lost and in that case, there
> will be no channels available from the device. So keeping the "struct dev" when
> there is no real device attached to the system, goes against the driver model
> IMO since we would be messing with the refcount.
But this is system hibernation or suspend and the reason for the power
loss is quite different from device removal at run time.
The device is going to be back during resume (or at least it is not
expected to go away in the meantime), so it is pointless to destroy
its representation in memory.
> For instance in the case of USB, if the device get's unplugged, would it make
> sense to keep the "struct dev" for the device in kernel in a hope that it would
> come back again?
At run time - no, during system suspend - yes.
It is not even recommended to free IRQs during system suspend.
> The driver model as I understood is, once the actual physical device gets
> removed, the refcount for "struct dev" should be decremented and it should be
> destroyed.
Not really.
Thanks!
More information about the ath11k
mailing list