[RFC][v1][Design] AP Architecture for Roaming with Wi-Fi 8 Seamless Mobility Domain (SMD)

Thu Nov 6 04:52:52 PST 2025

Hi Aditya,

On Tue, 2025-11-04 at 13:30 -0800, Aditya Sathish wrote:
> On 10/30/2025 01:47, Benjamin Berg wrote:
> > Do we have a clear picture of what the performance issues might be?
> > 
> > To me, it seems a bit that talking about threading might be a red
> > heering. It could be that the primary issue is the prioritization of
> > events when there is a burst of activity.
> > 
> > We should not need threading as long as each individual event can be
> > processed quickly and hostapd is able to keep up overall. In that case,
> > we have an event ordering issue which can likely be solved by using
> > multiple nl80211 sockets and giving them different priorities in the
> > mainloop. By doing that, you can effectively pull all time-critical
> > events to the front of the queue and ensure they are processed with
> > only a short delay.
> > 
> > Now, that doesn't invalidate your point at all. If we do have tasks
> > that take a long time (crypto?, external requests?), then it can make
> > sense to push just the slow part into a separate thread and pick up
> > processing in the main thread once it has finished.
> To preface our concerns: the longer the context transfer takes, the
> less time the serving AP has available to drain its remaining packets
> before hitting the SN limit. This can result in a larger gap in
> client-perceived latency during roaming depending on when the target
> AP is ready to start DL transmission. Efficient handling of the
> context transfer (during the ST Execution phase) is therefore key to
> minimize client-perceived latency.

Right, so the time it takes to drain the queues and the likelihood of
hitting the SN limit is not purely a function of how long the whole ST
Execution process takes.

I am probably missing some important bits, but I imagine that an early
bridge update is quite important. The earlier the bridge stops
forwarding packets to Serving AP MLD, the fewer packets need to be
flushed relaxing the SN limit issue and minimizing the time needed to
drain the queues.

As far as I can tell, one could immediately ask the bridge to stop
forwarding packets to the Serving AP MLD when the ST Execution Request
frame is received. Similarly, forwarding to the Target AP MLD can begin
as soon as it is known that the ST Execution will succeed. After all,
the Target AP MLD already knows about the station and can buffer frames
for it. Such buffering can happen before the HW is fully configured and
even before the Context has been transferred.

Now, the Target AP MLD does need to be fully configured before the ST
Execution Response can be sent out. But maybe we can effectively start
the the DL draining period early and relax the timings that way?

> You've accurately captured the core performance concerns we intended
> to highlight:
> 1. _Prioritization of Management Frames:_
>    Certain management frames - such as SMD - should be able to "skip
>    the line" since a roam request typically will indicate degraded
>    client link quality and prioritizing immediate transition over
>    processing existing management traffic is likely in the client's
>    best interest - especially after preparation has already completed.
> 2. _Additional Threads for Long-Polled Operations:_
>    Without multi-threading, long-polled netlink messages or system
>    calls can block even the prioritized kernel events. As you noted,
>    offloading slow operations that are busy waiting to worker threads
>    can allow the main thread to handle time-critical events.
> 
> That said, even with multi-threading in hostapd, we still face
> overhead from RTNL locks and netlink socket overhead — not only from
> hostapd of the serving MLD but also from other hostapd processes tied
> to other MLDs on the AP. This is why we initially proposed moving the
> ST Execution sequence to mac80211. Doing so could reduce round-trip
> latency from ST Execution Request to Response, especially under high
> system load and concurrent work across multiple hostapd interfaces.

Yeah, there may be some things for which a lock might need to be held
in the kernel. That said, with the above in mind, I am wondering what
the critical parts there are and how long the round-trip time is
permitted to be in the end.

Benjamin