[PATCH 3/4] firmware: arm_scmi: Introduce monotonically increasing tokens

Mon May 24 19:13:35 PDT 2021

On 5/24/2021 4:15 PM, Cristian Marussi wrote:
> Tokens are sequence numbers embedded in the each SCMI message header: they
> are used to correlate commands with responses (and delayed responses), but
> their usage and policy of selection is entirely up to the caller (usually
> the OSPM agent), while they are completely opaque to the callee (SCMI
> server platform) which merely copies them back from the command into the
> response message header.
> This also means that the platform does not, can not and should not enforce
> any kind of policy on received messages depending on the contained sequence
> number: platform can perfectly handle concurrent requests carrying the same
> identifiying token if that should happen.
> 
> Moreover the platform is not required to produce in-order responses to
> agent requests, the only constraint in these regards is that in case of
> an asynchronous message the delayed response must be sent after the
> immediate response for the synchronous part of the command transaction.
> 
> Currenly the SCMI stack of the OSPM agent selects as token for the

s/as token/a token/?

> egressing commands the lowest possible number which is not already in use
> by an existing in-flight transaction, which means, in other words, that
> we immediately reuse any token after its transaction has completed or it
> has timed out: this indeed simplifies token and associated xfer management
> and lookup.
> 
> Under the above assumptions and constraints, since there is really no state
> shared between the agent and the platform to let the platform know when a
> token and its associated message has timed out, the current policy of early
> reuse of tokens can easily lead to the situation in which a spurios or late

s/spurios/spurious/

> received response (or delayed_response), related to an old stale and timed
> out transaction, can be wrongly associated to a newer valid in-flight xfer
> that just happens to have reused the same token.
> 
> This misbehavior on such ghost responses is more easily exposed on those
> transports that naturally have an higher level of parallelism in processing
> multiple concurrent in-flight messages.
> 
> This commit introduces a new policy of selection of tokens for the OSPM
> agent: each new transfer now gets the next available and monotonically
> increasing token, until tokens are exhausted and the counter rolls over.
> 
> Such new policy mitigates the above issues with ghost responses since the
> tokens are now reused as later as possible (when they roll back ideally)
> and so it is much easier to identify ghost responses to stale timed out
> transactions: this also helps in simplifying the specific transports
> implementation since stale transport messages can be easily identified
> and discarded early on in the rx path without the need to cross check
> their actual sate with the core transport layer.
> This mitigation is even more effective when, as is usual the case, the

s/usual/usually/

> maximum number of pending messages is capped by the platform to a much
> lower value than whole possible range of tokens.(2^10)
> 
> This internal policy change in the core SCMI transport layer is fully
> transparent to the specific transports so it has not and should not have
> any impact on the transports implementation.
> 
> The empirically observed cost of such new procedure of token selection
> amounts in the best case to ~10us out of an observed full transaction cost
> of 3ms for the completion of a synchronous sensor reading command on a
> platform supporting commmands completion interrupts.

s/commmands/commands/

> 
> Signed-off-by: Cristian Marussi <cristian.marussi at arm.com>

Overall this looks good to me and is more straightforward than I thought.
[snip]

> +/**
> + * scmi_xfer_token_set  - Reserve and set new token for the xfer at hand
> + *
> + * @minfo: Pointer to Tx/Rx Message management info based on channel type
> + * @xfer: The xfer to act upon
> + *
> + * Pick the next unused monotonically increasing token and set it into
> + * xfer->hdr.seq: picking a monotonically increasing value avoids reusing
> + * immediately tokens of just completed or timed-out xfers, mitigating the risk
> + * of wrongly associating a late received answer for an expired xfer to a live
> + * in-flight transaction which happened to have reused the same token.

This was a bit harder to read than I thought, how about:

picking a monotonically increasing value avoids immediate reuse of
freshly completed or timed-out xfers, thus mitigating the risk of
incorrect association of a late and expired xfer with a live in-flight
transaction, both happening to re-use the same token identifier.
-- 
Florian