[PATCH net-next v5 1/2] net: ti: icssg-prueth: Add Frame Preemption MAC Merge support
Jakub Kicinski
kuba at kernel.org
Tue May 5 16:01:34 PDT 2026
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: ti: icssg-prueth: Add Frame Preemption MAC Merge support
This patch adds QoS infrastructure for Frame Preemption (FPE) support to
the ICSSG Ethernet driver, introducing prueth_qos_iet state tracking,
icssg_config_ietfpe() for firmware IET configuration, MQPRIO qdisc
offload that maps traffic classes to express/preemptible queues, and
link-state driven verify state machine re-triggering.
A few questions on the code below.
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_prueth.c b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
> index 591be5c8056b..7657dc1015f0 100644
> --- a/drivers/net/ethernet/ti/icssg/icssg_prueth.c
> +++ b/drivers/net/ethernet/ti/icssg/icssg_prueth.c
[ ... ]
> @@ -1686,6 +1689,8 @@ static int prueth_netdev_init(struct prueth *prueth,
>
> INIT_DELAYED_WORK(&emac->stats_work, icssg_stats_work_handler);
>
> + icssg_qos_init(ndev);
> +
> ret = pruss_request_mem_region(prueth->pruss,
> port == PRUETH_PORT_MII0 ?
> PRUSS_MEM_DRAM0 : PRUSS_MEM_DRAM1,
Is the fpe_lock mutex lifecycle balanced on all error paths?
icssg_qos_init() calls mutex_init(&iet->fpe_lock) unconditionally here,
but mutex_destroy() is only invoked in the netdev_unregister label in
prueth_probe() (gated on registered_netdevs[i]) and in prueth_remove().
The remaining paths do not appear to destroy the mutex:
- Every goto free / goto free_ndev within prueth_netdev_init() itself
(pruss_request_mem_region failure, platform_get_irq_byname_optional
failure, of_parse_phandle failure, of_get_phy_mode failure, the
unsupported phy_if case) calls free_netdev() directly.
- prueth_netdev_exit() (reached via the netdev_exit label and from
prueth_remove) frees the netdev without destroying fpe_lock.
- When register_netdev() succeeds for MAC0 but fails for MAC1 (or
when prueth_netdev_init() fails for MAC1 after MAC0 succeeded),
control flows through netdev_exit, which leaks the fpe_lock state
for the not-yet-registered netdev.
Would it be cleaner to pair mutex_destroy() with free_netdev in
prueth_netdev_exit() rather than at the registered_netdev unwind level?
[ ... ]
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_qos.c b/drivers/net/ethernet/ti/icssg/icssg_qos.c
> new file mode 100644
> index 000000000000..fc4b0ebc7d68
> --- /dev/null
> +++ b/drivers/net/ethernet/ti/icssg/icssg_qos.c
> @@ -0,0 +1,232 @@
[ ... ]
> +static void icssg_iet_set_preempt_mask(struct prueth_emac *emac)
> +{
> + void __iomem *config = emac->dram.va + ICSSG_CONFIG_OFFSET;
> + struct prueth_qos_mqprio *p_mqprio = &emac->qos.mqprio;
> + struct tc_mqprio_qopt *qopt = &p_mqprio->mqprio.qopt;
> + u8 preemptible_tcs = p_mqprio->preemptible_tcs;
> + struct prueth_qos_iet *iet = &emac->qos.iet;
> + int prempt_mask = 0, i;
> + u8 tc;
> +
> + /* The preemptible traffic classes should only be committed to hardware
> + * once TX is active.
> + */
> + if (!iet->fpe_active) {
> + netdev_dbg(emac->ndev, "FPE not active, skipping preempt mask config\n");
> + return;
> + }
> +
> + /* Configure the queues based on the preemptible tc map set by the user */
> + for (tc = 0; tc < p_mqprio->mqprio.qopt.num_tc; tc++) {
> + /* check if the tc is preemptive or not */
> + if (preemptible_tcs & BIT(tc)) {
> + for (i = qopt->offset[tc]; i < qopt->offset[tc] + qopt->count[tc]; i++) {
> + /* Set all the queues in this tc as preemptive queues */
> + writeb(BIT(4), config + EXPRESS_PRE_EMPTIVE_Q_MAP + i);
> + prempt_mask &= ~BIT(i);
^^^^^^^^^^^^^^^^^^^^^^^^^
Is the prempt_mask &= ~BIT(i) here dead code? prempt_mask is
initialised to 0, so clearing a bit that is already zero has no effect.
The other branch sets bits via prempt_mask |= BIT(i), so it looks like
the clear statement could simply be removed.
> + }
> + } else {
> + /* Set all the queues in this tc as express queues */
> + for (i = qopt->offset[tc]; i < qopt->offset[tc] + qopt->count[tc]; i++) {
> + writeb(0, config + EXPRESS_PRE_EMPTIVE_Q_MAP + i);
> + prempt_mask |= BIT(i);
> + }
> + }
> + netdev_set_tc_queue(emac->ndev, tc, qopt->count[tc], qopt->offset[tc]);
> + }
> + writeb(prempt_mask, config + EXPRESS_PRE_EMPTIVE_Q_MASK);
> +}
[ ... ]
> +/* Direct synchronous configuration of IET FPE.
> + * Caller must hold iet->fpe_lock.
> + */
> +void icssg_config_ietfpe(struct prueth_emac *emac, bool enable)
> +{
> + void __iomem *config = emac->dram.va + ICSSG_CONFIG_OFFSET;
> + struct prueth_qos_iet *iet = &emac->qos.iet;
> + int ret;
> + u8 val;
> +
> + /* return early if FPE is not active and need not be enabled */
> + if (!iet->fpe_enabled && !iet->fpe_active)
> + return;
> +
> + if (!netif_running(emac->ndev)) {
> + netdev_dbg(emac->ndev, "cannot change IET/FPE state when interface is down\n");
> + return;
> + }
> +
> + /* Update FPE Tx enable bit (PRE_EMPTION_ENABLE_TX) if
> + * fpe_enabled is set to enable MM in Tx direction
> + */
> + writeb(enable ? 1 : 0, config + PRE_EMPTION_ENABLE_TX);
> +
> + /* If FPE is to be enabled, first configure MAC Verify state
> + * machine in firmware as firmware kicks the Verify process
> + * as soon as ICSSG_EMAC_PORT_PREMPT_TX_ENABLE command is
> + * received.
> + */
> + if (enable && iet->mac_verify_configure) {
> + writeb(1, config + PRE_EMPTION_ENABLE_VERIFY);
> + writew(iet->tx_min_frag_size, config + PRE_EMPTION_ADD_FRAG_SIZE_LOCAL);
> + writel(iet->verify_time_ms, config + PRE_EMPTION_VERIFY_TIME);
> + } else {
> + iet->verify_status = ICSSG_IETFPE_STATE_DISABLED;
> + }
> +
> + /* Send command to enable FPE Tx side. Rx is always enabled */
> + ret = icssg_set_port_state(emac,
> + enable ? ICSSG_EMAC_PORT_PREMPT_TX_ENABLE :
> + ICSSG_EMAC_PORT_PREMPT_TX_DISABLE);
> + if (ret) {
> + netdev_err(emac->ndev, "TX preempt %s command failed\n",
> + str_enable_disable(enable));
> + writeb(0, config + PRE_EMPTION_ENABLE_VERIFY);
> + iet->verify_status = ICSSG_IETFPE_STATE_DISABLED;
> + return;
^^^^^^^
Can this disable path leave the software and hardware state out of sync?
When called with enable=false, PRE_EMPTION_ENABLE_TX has already been
written to 0 above before the firmware command is issued. If
icssg_set_port_state(ICSSG_EMAC_PORT_PREMPT_TX_DISABLE) then fails, the
function returns without updating iet->fpe_active, so fpe_active stays
true even though the TX-enable register now reads 0.
Later calls will then skip the !fpe_enabled && !fpe_active early return
and icssg_iet_set_preempt_mask() will keep writing preempt masks as if
TX were still active.
> + }
> +
> + if (enable && iet->mac_verify_configure) {
> + ret = icssg_iet_verify_wait(emac);
> + if (ret) {
> + netdev_err(emac->ndev, "MAC Verification failed with timeout\n");
> + return;
^^^^^^^
On the verify-wait timeout path (and the PRE_EMPTION_ACTIVE_TX != 1
path below), the driver has already written PRE_EMPTION_ENABLE_TX=1,
PRE_EMPTION_ENABLE_VERIFY=1, PRE_EMPTION_ADD_FRAG_SIZE_LOCAL,
PRE_EMPTION_VERIFY_TIME, and issued the
ICSSG_EMAC_PORT_PREMPT_TX_ENABLE firmware command. This error path
returns without reverting any of those writes, and iet->fpe_active
remains false.
A later user-driven disable will then hit the early return:
if (!iet->fpe_enabled && !iet->fpe_active)
return;
so the hardware stays configured for FPE while the driver believes it
is off. Should these error paths also perform rollback like the
icssg_set_port_state() failure branch does?
> + }
> + } else if (enable) {
> + /* Give firmware some time to update PRE_EMPTION_ACTIVE_TX state */
> + usleep_range(100, 200);
> + }
> +
> + if (enable) {
> + val = readb(config + PRE_EMPTION_ACTIVE_TX);
> + if (val != 1) {
> + netdev_err(emac->ndev,
> + "Firmware fails to activate IET/FPE\n");
> + return;
> + }
> + iet->fpe_active = true;
> + } else {
> + iet->fpe_active = false;
> + }
> +
> + icssg_iet_set_preempt_mask(emac);
On the disable path (enable=false), fpe_active was just set to false,
but icssg_iet_set_preempt_mask() early-returns precisely on
!fpe_active:
if (!iet->fpe_active) {
netdev_dbg(emac->ndev, "FPE not active, skipping preempt mask config\n");
return;
}
So on a normal enable -> disable cycle, the EXPRESS_PRE_EMPTIVE_Q_MAP
per-queue entries and the EXPRESS_PRE_EMPTIVE_Q_MASK register keep the
preemptible markings that were written on the prior enable. Also,
PRE_EMPTION_ENABLE_VERIFY is only zeroed on the error branch above; the
successful disable does not clear it. Is a subsequent re-enable with a
different mqprio layout intended to see the stale markings from the
previous layout?
> + netdev_err(emac->ndev, "IET FPE %s successfully\n",
^^^^^^^^^^
Is netdev_err the right severity for a success message? Every link
up/down event while FPE is active goes through emac_adjust_link() ->
icssg_qos_link_state_update() -> icssg_config_ietfpe(), so this will
emit a KERN_ERR line on every link flap. netdev_info() or
netdev_dbg() might be more appropriate.
> + str_enable_disable(iet->fpe_active));
> +}
> +EXPORT_SYMBOL_GPL(icssg_config_ietfpe);
Is EXPORT_SYMBOL_GPL(icssg_config_ietfpe) needed? The only caller
outside this file (in the follow-up ethtool MAC Merge patch) lives in
icssg_ethtool.c, which is part of the same icssg-y module as
icssg_qos.c, so no export is required.
> +
> +void icssg_qos_init(struct net_device *ndev)
> +{
> + struct prueth_emac *emac = netdev_priv(ndev);
> + struct prueth_qos_iet *iet = &emac->qos.iet;
> +
> + iet->emac = emac;
^^^^^^^^^^^^^^^
Is the iet->emac back-pointer used anywhere? A grep across
drivers/net/ethernet/ti/icssg shows it is only written here; every
function in icssg_qos.c that needs emac either receives it as a
parameter or derives it from netdev_priv(). If it has no readers, the
field and this assignment can be removed.
> + mutex_init(&iet->fpe_lock);
> +}
[ ... ]
> +static int emac_tc_setup_mqprio(struct net_device *ndev, void *type_data)
> +{
> + struct tc_mqprio_qopt_offload *mqprio = type_data;
> + struct prueth_emac *emac = netdev_priv(ndev);
> + struct tc_mqprio_qopt *qopt = &mqprio->qopt;
> + struct prueth_qos_mqprio *p_mqprio;
> + u8 num_tc = mqprio->qopt.num_tc;
> + int tc, offset, count;
> +
> + p_mqprio = &emac->qos.mqprio;
> +
> + if (!num_tc) {
> + netdev_reset_tc(ndev);
> + p_mqprio->preemptible_tcs = 0;
> + p_mqprio->mqprio.qopt.num_tc = 0;
> + goto reset_tcs;
> + }
In the num_tc == 0 (qdisc deletion) path, only software bookkeeping is
reset here. icssg_iet_set_preempt_mask() iterates tc = 0 .. num_tc, so
with num_tc=0 no writeb() is issued to the EXPRESS_PRE_EMPTIVE_Q_MAP
table and per-queue markings written by a prior configuration persist
in firmware memory. Should the qdisc-deletion path clear the per-queue
map as well?
> +
> + memcpy(&p_mqprio->mqprio, mqprio, sizeof(*mqprio));
> + p_mqprio->preemptible_tcs = mqprio->preemptible_tcs;
Can these writes race against emac_adjust_link()?
struct prueth_qos_iet documents that p_mqprio is protected by fpe_lock,
and icssg_iet_set_preempt_mask() reads preemptible_tcs, qopt->num_tc,
qopt->offset[] and qopt->count[] under fpe_lock via
icssg_iet_change_preemptible_tcs().
However, emac_tc_setup_mqprio() writes p_mqprio here without taking
fpe_lock. ndo_setup_tc runs under RTNL, but emac_adjust_link() is
invoked from the phylib state-machine workqueue which holds only
phydev->lock, not RTNL, so the two paths can run concurrently on
different CPUs. While this memcpy is in flight, adjust_link can
acquire fpe_lock and observe a torn p_mqprio (for example new num_tc
with old offset/count arrays) and program incorrect
EXPRESS_PRE_EMPTIVE_Q_MAP / EXPRESS_PRE_EMPTIVE_Q_MASK values.
Should fpe_lock be held around the p_mqprio writes in
emac_tc_setup_mqprio(), rather than only around the HW-commit step?
Also, p_mqprio->preemptible_tcs is u8 while
tc_mqprio_qopt_offload::preemptible_tcs is unsigned long. Any bits
above bit 7 are silently dropped with no error returned to userspace.
The current 4-queue hardware makes this unreachable in practice, but
should the driver either validate num_tc <= 8 via NL_SET_ERR_MSG_MOD or
widen the storage?
It also looks like mqprio->mode, mqprio->shaper, mqprio->flags, and the
min_rate[]/max_rate[] arrays are never inspected. A user requesting
TC_MQPRIO_MODE_CHANNEL, a shaper, or rate limits will get 0 (success)
back from the kernel, but only the queue-to-TC mapping is applied.
Should the driver reject offload parameters it does not implement?
> + netdev_set_num_tc(ndev, mqprio->qopt.num_tc);
> +
> + for (tc = 0; tc < num_tc; tc++) {
> + count = qopt->count[tc];
> + offset = qopt->offset[tc];
> + netdev_set_tc_queue(ndev, tc, count, offset);
> + }
> +
> +reset_tcs:
> + icssg_iet_change_preemptible_tcs(emac);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(icssg_qos_init);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Is this EXPORT_SYMBOL_GPL placed at the intended function? It follows
the closing brace of emac_tc_setup_mqprio() rather than
icssg_qos_init(). emac_tc_setup_mqprio is static so it cannot be
exported; the export of icssg_qos_init compiles regardless because the
macro references the name directly, but the placement reads like a
copy/paste slip.
> +
> +int icssg_qos_ndo_setup_tc(struct net_device *ndev, enum tc_setup_type type,
[ ... ]
More information about the linux-arm-kernel
mailing list