[PATCH 2/2] ath10k: make core registering async
Michal Kazior
michal.kazior at tieto.com
Mon Jun 23 03:19:03 PDT 2014
Hi Johannes,
On 22 May 2014 16:41, Johannes Berg <johannes at sipsolutions.net> wrote:
> On Thu, 2014-05-22 at 16:12 +0200, Michal Kazior wrote:
>
>> As a side effect there's no way to propagate
>> registering errors to the pci subsystem but this
>> probably isn't really necessary.
>
> You should probably unbind if it fails - iwlwifi does that.
Hmm..
As it stands ath10k now deadlocks when firmware probing worker fails
and calls device_release_driver(). This indirectly hits pci remove()
callback which waits for the worker. Oops.
So I've taken a look how other driver do it so I can come up with a
solution - most of them seem use completions. After taking a closer
look I came to a conclusion this is inherently racy too. Consider the
following scenario:
[syscall]
insmod
pci->probe()
queue_work()
return
[worker]
probe_fw()
[syscall]
rmmod
dev_lock()
pci->remove()
wait_for_completion()
[worker]
complete_all()
device_release_driver()
dev_lock()
{already held, yield}
[syscall]
free(internal structures)
dev_unlock()
return
[worker]
{woken up, but dev->driver == NULL so no callbacks}
dev_unlock()
return
The driver code section may not be reachable anymore upon worker
returning from the device_release_driver() call, right? Also since
ath10k uses an internal worker it also means the work_struct would be
already freed by the syscall flow (i.e. worker would run after driver
has supposedly been cleaned up..). Even if ath10k was to use
request_firmware_nowait(), which allocates a temporary work_struct,
the unreachable code section still remains a problem.
Or maybe this isn't really a problem and/or I'm missing something?
Michał
More information about the ath10k
mailing list