[PATCH 2/2] ath10k: make core registering async

Michal Kazior michal.kazior at tieto.com
Mon Jun 23 03:19:03 PDT 2014


Hi Johannes,


On 22 May 2014 16:41, Johannes Berg <johannes at sipsolutions.net> wrote:
> On Thu, 2014-05-22 at 16:12 +0200, Michal Kazior wrote:
>
>> As a side effect there's no way to propagate
>> registering errors to the pci subsystem but this
>> probably isn't really necessary.
>
> You should probably unbind if it fails - iwlwifi does that.

Hmm..

As it stands ath10k now deadlocks when firmware probing worker fails
and calls device_release_driver(). This indirectly hits pci remove()
callback which waits for the worker. Oops.

So I've taken a look how other driver do it so I can come up with a
solution - most of them seem use completions. After taking a closer
look I came to a conclusion this is inherently racy too. Consider the
following scenario:

 [syscall]
 insmod
 pci->probe()
 queue_work()
 return
 [worker]
 probe_fw()
 [syscall]
 rmmod
 dev_lock()
 pci->remove()
 wait_for_completion()
 [worker]
 complete_all()
 device_release_driver()
 dev_lock()
 {already held, yield}
 [syscall]
 free(internal structures)
 dev_unlock()
 return
 [worker]
 {woken up, but dev->driver == NULL so no callbacks}
 dev_unlock()
 return

The driver code section may not be reachable anymore upon worker
returning from the device_release_driver() call, right? Also since
ath10k uses an internal worker it also means the work_struct would be
already freed by the syscall flow (i.e. worker would run after driver
has supposedly been cleaned up..). Even if ath10k was to use
request_firmware_nowait(), which allocates a temporary work_struct,
the unreachable code section still remains a problem.

Or maybe this isn't really a problem and/or I'm missing something?


Michał



More information about the ath10k mailing list