[PATCH 12/13] wifi: mt76: mt7925: fix ROC deadlocks and race conditions

Felix Fietkau nbd at nbd.name
Tue Jan 27 03:06:07 PST 2026


On 20.01.26 21:10, Zac wrote:
> From: Zac Bowling <zac at zacbowling.com>
> 
> Fix multiple interrelated issues in the remain-on-channel (ROC) handling
> that cause deadlocks, race conditions, and resource leaks.
> 
> Problems fixed:
> 
> 1. Deadlock in sta removal ROC abort path:
>     When a station is removed while a ROC operation is in progress, the
>     driver would call mt7925_roc_abort_sync() which waits for ROC completion.
>     However, the ROC work itself needs to acquire mt792x_mutex which is
>     already held during station removal, causing a deadlock.
> 
>     Fix: Use async ROC abort (mt76_connac_mcu_abort_roc) when called from
>     paths that already hold the mutex, and add MT76_STATE_ROC_ABORT flag
>     to coordinate between the abort and the ROC timer.
> 
> 2. ROC timer race during suspend:
>     The ROC timer could fire after the device started suspending but before
>     the ROC was properly aborted, causing undefined behavior.
> 
>     Fix: Delete ROC timer synchronously before suspend and check device
>     state before processing ROC timeout.
> 
> 3. ROC rate limiting for MLO auth failures:
>     Rapid ROC requests during MLO authentication can overwhelm the firmware,
>     causing authentication timeouts. The MT7925 firmware has limited ROC
>     handling capacity.
> 
>     Fix: Add rate limiting infrastructure with configurable minimum interval
>     between ROC requests. Track last ROC completion time and defer new
>     requests if they arrive too quickly.
> 
> 4. WCID leak in ROC cleanup:
>     When ROC operations are aborted, the associated WCID resources were
>     not being properly released, causing resource exhaustion over time.
> 
>     Fix: Ensure WCID cleanup happens in all ROC termination paths.
> 
> 5. Async ROC abort race condition:
>     The async ROC abort could race with normal ROC completion, causing
>     double-free or use-after-free of ROC resources.
> 
>     Fix: Use MT76_STATE_ROC_ABORT flag and proper synchronization to
>     prevent races between async abort and normal completion paths.
> 
> These fixes work together to provide robust ROC handling that doesn't
> deadlock, properly releases resources, and handles edge cases during
> suspend and MLO operations.
> 
> Fixes: c948b5da6bbe ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 device")
> Signed-off-by: Zac Bowling <zac at zacbowling.com>

The rate limiting code seems a bit suspicious to me.
What does "limited ROC handling capacity" mean? Outstanding ROC 
requests? Does it need time to settle after a completed ROC?
This needs to be clarified and likely replaced with a more targeted fix.

- Felix



More information about the Linux-mediatek mailing list