[PATCH net-next v4 1/2] net: ti: icssg-prueth: Add Frame Preemption MAC Merge support

Thu Feb 26 08:19:38 PST 2026

Hi Meghana,

On Tue, Feb 24, 2026 at 06:18:02PM +0530, Meghana Malladi wrote:
> diff --git a/drivers/net/ethernet/ti/icssg/icssg_qos.c b/drivers/net/ethernet/ti/icssg/icssg_qos.c
> new file mode 100644
> index 000000000000..388dfcea426b
> --- /dev/null
> +++ b/drivers/net/ethernet/ti/icssg/icssg_qos.c
> @@ -0,0 +1,223 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Texas Instruments ICSSG PRUETH QoS submodule
> + * Copyright (C) 2023 Texas Instruments Incorporated - http://www.ti.com/
> + */
> +
> +#include "icssg_prueth.h"
> +#include "icssg_switch_map.h"
> +
> +static void icssg_iet_set_preempt_mask(struct prueth_emac *emac, u8 preemptible_tcs)
> +{
> +	void __iomem *config = emac->dram.va + ICSSG_CONFIG_OFFSET;
> +	struct prueth_qos_mqprio *p_mqprio = &emac->qos.mqprio;
> +	struct tc_mqprio_qopt *qopt = &p_mqprio->mqprio.qopt;
> +	int prempt_mask = 0, i;
> +	u8 tc;
> +
> +	/* Configure the queues based on the preemptible tc map set by the user */
> +	for (tc = 0; tc < p_mqprio->mqprio.qopt.num_tc; tc++) {
> +		/* check if the tc is preemptive or not */
> +		if (preemptible_tcs & BIT(tc)) {
> +			for (i = qopt->offset[tc]; i < qopt->offset[tc] + qopt->count[tc]; i++) {
> +				/* Set all the queues in this tc as preemptive queues */
> +				writeb(BIT(4), config + EXPRESS_PRE_EMPTIVE_Q_MAP + i);
> +				prempt_mask &= ~BIT(i);
> +			}
> +		} else {
> +			/* Set all the queues in this tc as express queues */
> +			for (i = qopt->offset[tc]; i < qopt->offset[tc] + qopt->count[tc]; i++) {
> +				writeb(0, config + EXPRESS_PRE_EMPTIVE_Q_MAP + i);
> +				prempt_mask |= BIT(i);
> +			}
> +		}
> +		netdev_set_tc_queue(emac->ndev, tc, qopt->count[tc], qopt->offset[tc]);
> +	}
> +	writeb(prempt_mask, config + EXPRESS_PRE_EMPTIVE_Q_MASK);
> +}

Shouldn't the preemptible TCs be committed to hardware only if FPE is
active? The callers pay absolutely no regard to that.

> +
> +static void icssg_config_ietfpe(struct work_struct *work)
> +{
> +	struct prueth_qos_iet *iet =
> +		container_of(work, struct prueth_qos_iet, fpe_config_task);
> +	void __iomem *config = iet->emac->dram.va + ICSSG_CONFIG_OFFSET;
> +	struct prueth_qos_mqprio *p_mqprio =  &iet->emac->qos.mqprio;
> +	bool enable = !!atomic_read(&iet->enable_fpe_config);
> +	int ret;
> +	u8 val;
> +
> +	if (!netif_running(iet->emac->ndev))
> +		return;
> +
> +	mutex_lock(&iet->fpe_lock);
> +
> +	/* Update FPE Tx enable bit (PRE_EMPTION_ENABLE_TX) if
> +	 * fpe_enabled is set to enable MM in Tx direction
> +	 */
> +	writeb(enable ? 1 : 0, config + PRE_EMPTION_ENABLE_TX);
> +
> +	/* If FPE is to be enabled, first configure MAC Verify state
> +	 * machine in firmware as firmware kicks the Verify process
> +	 * as soon as ICSSG_EMAC_PORT_PREMPT_TX_ENABLE command is
> +	 * received.
> +	 */
> +	if (enable && iet->mac_verify_configure) {
> +		writeb(1, config + PRE_EMPTION_ENABLE_VERIFY);
> +		writew(iet->tx_min_frag_size, config + PRE_EMPTION_ADD_FRAG_SIZE_LOCAL);
> +		writel(iet->verify_time_ms, config + PRE_EMPTION_VERIFY_TIME);
> +	}
> +
> +	/* Send command to enable FPE Tx side. Rx is always enabled */
> +	ret = icssg_set_port_state(iet->emac,
> +				   enable ? ICSSG_EMAC_PORT_PREMPT_TX_ENABLE :
> +					    ICSSG_EMAC_PORT_PREMPT_TX_DISABLE);
> +	if (ret) {
> +		netdev_err(iet->emac->ndev, "TX preempt %s command failed\n",
> +			   str_enable_disable(enable));
> +		writeb(0, config + PRE_EMPTION_ENABLE_VERIFY);
> +		iet->verify_status = ICSSG_IETFPE_STATE_DISABLED;
> +		goto unlock;
> +	}
> +
> +	if (enable && iet->mac_verify_configure) {
> +		ret = readb_poll_timeout(config + PRE_EMPTION_VERIFY_STATUS, iet->verify_status,
> +					 (iet->verify_status == ICSSG_IETFPE_STATE_SUCCEEDED),
> +					 USEC_PER_MSEC, 5 * USEC_PER_SEC);

You are sleeping up to 5 seconds in the system_percpu_wq kernel-wide
workqueue, blocking the kernel from making any sort of progress with
other items in this workqueue. As include/linux/workqueue.h puts it:
"Don't queue works which can run for too long.".

I guess you should allocate a driver-private workqueue on which you can
sleep as much as you want. Or make the icssg_config_ietfpe task smarter,
to be stateful and reschedule itself until the PRE_EMPTION_VERIFY_STATUS
changes, or a timeout expires. But that's more complicated.

Side note: I had this question on my mind - all contexts from which you
call schedule_work(&iet->fpe_config_task) are sleepable, so why not just
invoke icssg_config_ietfpe() via a direct function call instead?
I guess that's why - it takes too long to reasonably wait for it from
call sites like ethtool, phylink etc. I would make sure this design
decision is part of the commit message.

But let's not lie to ourselves. Having a deferred fpe_config_task
creates its own problems which you are not handling well.

Consider:
- iet->tx_min_frag_size
- iet->verify_time_ms

Writer is emac_set_mm(), reader is icssg_config_ietfpe(). But the reader
can run concurrently with the writer. This means the reader can pick up
and send to firmware settings in an inconsistent state (old tx_min_frag_size
with new verify_time_ms). Configuration which was never requested by the user.

You have a mutex &iet->fpe_lock. Does it help? No.
You have an atomic &iet->enable_fpe_config. Does it help? Also no.

Please try to think of a synchronization pattern where the config
writer, emac_set_mm(), stops or otherwise blocks out the deferred reader
while it's making changes.

In addition, schedule_work() while the work is already scheduled will do
nothing. So if the fpe_config_task takes close to 5 seconds and the user
sends multiple ethtool --set-mm requests in that time, they will be
ignored or incorrectly processed.

Also, iet->fpe_active is problematic too. It has emac_get_mm(),
icssg_qos_link_up() and icssg_qos_link_down() as readers, and
icssg_config_ietfpe() as writer. But it's not obvious what the correct
access pattern is. These things rarely work correctly by chance :(

I'm sorry that I don't have more time to untangle everything and see
what would work best. As a result, the comments above are just "some"
observations. Please try to be more deliberate with the synchronization
procedures, explain them and I am more than happy to double-check their
sanity. It's just that not much effort seems to have been put into the
current proposal.