[DESIGN RFC] Critical Update handling in the kernel

Wed Jul 23 03:33:07 PDT 2025

On Wed, 2025-07-23 at 10:04 +0530, Aditya Kumar Singh wrote:
> > FWIW we also have some client side code for it already. It's probably
> > broken ;-)
> 
> Yes, I’ve seen that. I’m not entirely sure if it’s broken, but it does 
> seem related to CSA parsing from the link—there is code handling that. 
> If anything turns out to be broken, we can fix it.

We just never tested it well, that's why I said that. Sure, if it's
broken we'll fix it.

> > > Assumptions:
> > > ------------
> > > The critical update procedure is highly timing-sensitive.
> > 
> > There are two aspects here that I feel maybe you're conflating a bit. On
> > the one hand, yes, some of it needs to have the precise information for
> > the counters. OTOH, some of it is really not _that_ timing critical,
> > e.g. whether or not you actually get the information live at exactly the
> > right time. Though of course a frame that's transmitted after a beacon
> > that advertised CSA should not suddenly _not_ advertise it even cross-
> > link.
> 
> Absolutely! We acknowledge that achieving the ideal case may not be 
> possible. Keeping the logic in the kernel brings us closer to the 
> expected behavior, whereas shifting it to hostapd would likely result in 
> a much larger deviation.

Fully in hostapd I would agree, but not sure it should be fully in the
kernel as you propose either? We have some middle ground for CSA today
with just the counter updates, rather than creating the elements and all
that. Not sure we couldn't/shouldn't replicate that?

> > Wouldn't it be far simpler to have hostapd indicate the critical update,
> > and even the BPCC value? It also has to include BPCC in RNR and other
> > fields, do we really want to _parse_ all of that and update all those
> > values?! Seems iffy at best.
> 
> The challenge is that if hostapd takes on this task, it would need to 
> send updates for all partner links. Theoretically, this could result in 
> up to 15 beacon NLs being transmitted. Processing these NLs and updating 
> the beacons would already be time-consuming.

Is it though? I mean, where do you see the bottleneck? Locking wise it's
just the wiphy lock that shouldn't be contended, netlink isn't the most
inefficient API ... sure there are roundtrips involved, but this seems a
bit "handwavy" to me.

We could also always have new per-link attributes for the beacon
template updates for other links so it doesn't require full commands but
just updates the beacon tail (probably only that?) of other links in the
same operation?

> On top of that, managing 
> the additional offsets required to update the counters adds another 
> layer of complexity.
> 
> OTOH, the proposed design just takes 1 NL update and rest it does 
> automatically what is needed which will be less time consuming than 
> multiple NL commands flow.

I still don't buy the "time consuming" thing, but I agree that managing
the counter locations would be more difficult. OTOH, that's a list of
counters per link, so it's not _that_ much complexity?

I really don't like the parsing in kernel for two reasons:

 1) it's a lot of complexity that would never exist at all when hostapd
    gives the data because it's building it, it doesn't need to parse

 2) less importantly, I also think it ossifies things since the kernel
    would need updates for even minor changes in the layout or elements
    (this of course already being true for FW based solutions)

> > >      2.2 Parse the MBSSID profile element.
> > >          2.2.1 For each non-Tx profile:
> > >              2.2.1.1 Extract the BSSID
> > >              2.2.1.2 Retrieve the corresponding link sdata from BSSID
> > >              2.2.1.3 If the critical update flag is set in link object, set
> > >                      the CU (Critical Update) bit in the capability element.
> > >              2.2.1.4 Set non_tx_update = true.
> > >      2.3 Parse the multi-link element.
> > >          2.3.1 For the self link (already known), fetch the latest BPCC
> > >                value.
> > >          2.3.2 Update the frame with this BPCC value.
> > >      2.4 Update the capability element in the SKB:
> > >          2.4.1 If the critical update flag is set, set the CU bit.
> > >          2.4.2 If non_tx_update == true, also set the non-Tx CU bit.
> > 
> > How much of that can we let hostapd do?
> 
> hostapd forms beacon so it can do all of this. The only bottleneck is 
> communicating all link's beacon update sequentially via NL. And if 
> another update happens in the meantime - for example link A started CSA 
> and then link B also started CSA, these will pile up the NL socket queue 
> with beacon updates.

I do wonder if this is a practical concern? And if you have seen this in
practice then where does the bottleneck come from?

But it can also be side-stepped by batching the operations and not doing
a full update like I wrote above.

So I'm not sure from a design perspective for mac80211 we should let it
be constrained by this. That doesn't mean there aren't other reasons for
the design to go either way, but we can work around this one (if it even
is an issue.)

> > > This mechanism sufficiently handles critical updates due to modifications
> > > of existing elements (elements related to the critical update which are
> > > already present in the beacon and just one or more field is/are changing
> > > its value). However, in the case of new element inclusion (elements related
> > > to the critical update which are not alread present in the beacon, but is
> > > getting added from next beacon), additional handling is required.
> > > Specifically, the newly included elements related to the critical update
> > > must be added to the per-STA profile within the Basic Multi-Link Element
> > > (BMLE) [3].

[snip timing explanation]

> You're right, but that's not quite what the original text is trying to 
> convey.

OK, sorry, I misunderstood then.

> There are generally two types of critical updates:
>    1. Adding a new element, such as CSA or CCA.
>    2. Modifying an existing element, like VHT/HE/EHT Operation IE, TWT, etc.
> 
> In the second case, forming the beacon is relatively 
> straightforward—only the BPCC and the critical update flag need to be 
> set. However, in the first case, it's more complex. You need to insert 
> the new elements (e.g., CSA IE) into the per-STA profile of the affected 
> link within the reporting link, as shown in the earlier diagram.

Right, of course.

> So to handle this case (additional of new element), the below steps are 
> additionally required on top of what was already explained above.
> 
> Does this make sense now?

Maybe more? I really wasn't reading it well before, I thought it was
more about when the new elements were added, rather than the fact _that_
they need to be added.

I guess to me that was really almost the same, the beacon changes.

> > That kind of also seems like hostapd could just update the beacons?
> 
> Yes it can. Only thing is multiple NL beacon updates.

OK so if you're really so worried about that, and that's the main thing,
we can add the per-link updates into a nested element in the switching-
link's nl80211 message :)

> 
> > > Update BPCC back to NL:
> > > -----------------------
> > > After the kernel increments the BPCC value, it can emit a netlink event
> > > containing the updated BPCC value. This allows hostapd to receive the
> > > latest BPCC and include it in probe/association response frames, as
> > > required by the specification.
> > > 
> > > This coordination ensures that all - beacon, probe response, and
> > > association response frames reflect consistent BPCC values, maintaining
> > > compliance with spec-defined behavior for multi-link and critical update
> > > handling.
> > 
> > Yeah not liking this too much I guess... Why can't hostapd maintain
> > BPCC?
> 
> It can only when everything is fully handled by hostapd. Otherwise 
> kernel should be owner of having the latest value. It can report the 
> latest value to hostap upon any change.

When is it not fully handled by hostapd though? It always decides to do
updates, no? Are you thinking there are cases where the driver
(firmware?) unilaterally changes some things (say the puncturing bitmap)
and never handshakes that with hostapd, just starts overwriting the
beacons? Maybe we have a broader architecture discussion to be had here.

> > > Complexities:
> > > -------------
> > > When adding elements in per STA profile, the element length might exceed
> > > hence fragmentation needs to be handled properly.
> > 
> > We have fragmentation but all this parsing means also we need
> > _defragmentation_ first, which is awkward?
> 
> I think de-fragmentation is already supported ? If not, we will have to 
> add it to support it.

It is, generally, but it's completely separate in terms of memory
handling etc. so it's not that simple.

> > > memcmp vs hashing
> > > =================
> > > In the section where memcmp() is used to detect changes in beacon elements
> > > (to determine if a critical update is needed), this logic could
> > > alternatively be handled at the hostapd level.
> > 
> > Yeah sounds better to me :P
> 
> Okay so only the link which is undergoing a change, hostapd will signal 
> to kernel about critical update? Rest kernel will handled parsing and 
> updating the partner links as per the need?

I still don't like the parsing though. Like I said above, that's a lot
of complexity that simply doesn't even exist when hostapd controls all
the updates.

I really would have envisioned this much more along the lines of having
hostapd start the channel switch (WLOG on link 1 out of links 1 and 2)
with an nl80211 message that has:

 - post-switch beacon for link 1
 - during-switch beacon for link 1
   - CSA counter offsets for CSA, eCSA, etc. elements
 - post-switch beacon for link 2 (?)
 - during-switch beacon for link 2
   - CSA counter offsets for link 1's CSA counter
     (for all the per-STA profile link 1 elements)

isn't that pretty much it? All the BPCC is well-contained within that
and maintained by hostapd.

I'll agree that this limits to only doing a single channel switch at a
time across the whole MLD, but maybe that's not the most problematic
thing?

OK I guess I'll have to think about this more ...

johannes