[PATCH phy-next 12/13] phy: lynx-10g: new driver

Fri May 29 05:34:15 PDT 2026

On Thu, May 28, 2026 at 06:20:29PM +0000, sashiko-bot at kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
> 
> New issues:
> - [High] Lockless read-modify-write operations on the LNaGCR0 register create a data race between asynchronous CDR lock checks and PHY configuration.
> - [Medium] The return value of lynx_10g_lane_enable_pcvt() is discarded when called inside WARN_ON(), masking hardware configuration failures.
> - [Medium] The return value of lynx_pcvt_rmw() for the CR(0) register is completely ignored, masking potential hardware failures.
> - [Medium] lynx_10g_validate() incorrectly permits transitions from LANE_MODE_UNKNOWN, which are later rejected by lynx_10g_set_mode().
> 
> Pre-existing issues:
> - [Critical] The delayed work priv->cdr_check may unconditionally re-arm itself, leading to a Use-After-Free upon device unbind or module removal.
> --
> 
> commit 8617183053a524e1cbc4453e0e8f8e478c4cf8ef
> Author: Vladimir Oltean <vladimir.oltean at nxp.com>
> 
> phy: lynx-10g: new driver
> 
> Introduce a driver for the networking lanes of the 10G Lynx SerDes
> block, present on the majority of Layerscape and QorIQ (Freescale/NXP)
> SoCs. The consumers use the Generic PHY framework to reconfigure the SerDes
> for requested link modes.
> 
> > diff --git a/drivers/phy/freescale/phy-fsl-lynx-10g.c b/drivers/phy/freescale/phy-fsl-lynx-10g.c
> > new file mode 100644
> > index 0000000000000..28b0823574684
> > --- /dev/null
> > +++ b/drivers/phy/freescale/phy-fsl-lynx-10g.c
> > @@ -0,0 +1,1319 @@
> 
> [ ... ]
> 
> > +static void lynx_10g_cdr_lock_check(struct lynx_lane *lane)
> > +{
> > +	u32 tcsr3 = lynx_lane_read(lane, LNaTCSR3);
> > +
> > +	if (tcsr3 & LNaTCSR3_CDR_LCK)
> > +		return;
> > +
> > +	dev_dbg(&lane->phy->dev,
> > +		"Lane %c CDR unlocked, resetting receiver...\n",
> > +		'A' + lane->id);
> > +
> > +	lynx_lane_rmw(lane, LNaGCR0, LNaGCR0_RRST_ON, LNaGCR0_RRST);
> 
> [Severity: High]
> Can concurrent modifications to the LNaGCR0 register corrupt its state?
> 
> Since the priv->cdr_check work runs asynchronously, it could call
> lynx_10g_cdr_lock_check() and modify LNaGCR0 at the same time the PHY
> framework calls phy_power_on(), phy_power_off(), or phy_set_mode(). These
> functions also read-modify-write LNaGCR0 without per-lane locking,
> potentially dropping reset or power-down bits.

This is why lynx_cdr_lock_check() acquires &lane->phy->mutex prior to
calling priv->info->cdr_lock_check() (lynx_10g_cdr_lock_check). The
concurrency is blocked, this is a false report.

> [ ... ]
> 
> > +static const struct lynx_info lynx_info_ls1028a = {
> > +	.get_pccr = ls1028a_get_pccr,
> > +	.get_pcvt_offset = ls1028a_get_pcvt_offset,
> > +	.pll_read_configuration = lynx_10g_pll_read_configuration,
> > +	.lane_read_configuration = lynx_10g_lane_read_configuration,
> > +	.cdr_lock_check = lynx_10g_cdr_lock_check,
> 
> [Severity: Critical]
> This is a pre-existing issue, but can the delayed work priv->cdr_check cause
> a use-after-free upon module removal?
> 
> If lynx_cdr_lock_check() unconditionally re-arms itself with
> queue_delayed_work(), cancel_delayed_work_sync() during remove might wait
> for the current execution to finish, only for it to queue itself again.
> After device memory is freed, the newly armed timer would fire and access
> freed memory.
> 
> [ ... ]

I asked an LLM to look at whether cancel_delayed_work_sync() protects
against attempts from the work to reschedule itself, and it looks like
it does.

  The disable count is incremented at the very beginning of the
  cancellation process and decremented just before the function returns.
  Here is the call chain in kernel/workqueue.c:

   1. cancel_delayed_work_sync() calls __cancel_work_sync(work, WORK_CANCEL_DELAYED).
   2. __cancel_work_sync() immediately calls:

   1     __cancel_work(work, cflags | WORK_CANCEL_DISABLE);
   2     // where cflags is WORK_CANCEL_DELAYED, so this includes the DISABLE flag.
   3. __cancel_work() performs the increment:
       * It calls work_grab_pending() to claim the work's pending bit.
       * It calls work_offqd_unpack() to read the current state from work->data.
       * It calls work_offqd_disable(&offqd), which increments offqd.disable++.
       * It packs this updated state back into work->data using set_work_pool_and_clear_pending().
   4. __cancel_work_sync() then waits:
       * It calls __flush_work(work, true) to wait for any currently executing
         instance of the work to finish.
       * Crucially: While it is waiting, if the running work function tries to
         re-queue itself (like lynx_28g_cdr_lock_check does), the
         queue_work logic checks this disable count and fails the
         enqueue.
   5. __cancel_work_sync() performs the decrement:
       * At the end of the function, it checks if it should re-enable the work:

   1     if (!(cflags & WORK_CANCEL_DISABLE))
   2         enable_work(work);
       * Since cancel_delayed_work_sync passed WORK_CANCEL_DELAYED (which does
         not include the persistent DISABLE flag), enable_work(work) is called.
       * enable_work() calls work_offqd_enable(), which decrements the count back
       to its original value.

The LLM then makes a note that since commit 86898fa6b8cd ("workqueue:
Implement disable/enable for (delayed) work items") from kernel v6.10, a
new disable_delayed_work_sync() API function exists, which as opposed to
cancel_delayed_work_sync(), does not increment the disable count just
for the duration of the function call, but rather, leaves the work
persistently disabled, such that no future attempts to queue it from any
context will succeed. For this case, though, cancel_delayed_work_sync()
vs disable_delayed_work_sync() makes no functional difference, since we
just schedule the delayed work once during probe, and then the only
other reschedule source is itself.

> > +static int lynx_10g_lane_disable_pcvt(struct lynx_lane *lane,
> > +				      enum lynx_lane_mode mode)
> > +{
> > +	struct lynx_priv *priv = lane->priv;
> > +	int err;
> > +
> > +	spin_lock(&priv->pcc_lock);
> > +
> > +	err = lynx_pccr_write(lane, mode, 0);
> > +	if (err)
> > +		goto out;
> > +
> > +	switch (mode) {
> > +	case LANE_MODE_1000BASEX_SGMII:
> > +	case LANE_MODE_2500BASEX:
> > +		err = lynx_pcvt_rmw(lane, mode, CR(1), SGMIIaCR1_SGPCS_DIS,
> > +				    SGMIIaCR1_SGPCS_EN);
> > +		if (err)
> > +			goto out;
> > +
> > +		lynx_pcvt_rmw(lane, mode, CR(0),
> > +			      SGMIIaCR0_RST_SGM_ON | SGMIIaCR0_PD_SGM,
> > +			      SGMIIaCR0_RST_SGM | SGMIIaCR0_PD_SGM);
> 
> [Severity: Medium]
> Could this mask potential hardware failures?
> 
> While the return value of lynx_pcvt_rmw() for the CR(1) register is checked,
> the subsequent call for the CR(0) register discards its return value. If the
> register access fails, the error would be swallowed here and similarly in
> lynx_10g_lane_enable_pcvt().

This is intentional. The reasons for which lynx_pcvt_rmw() can fail are
not time-variable, so a single error check is sufficient. If the LLM
looked at the errors being propagated it would have seen they are
because we were unable to get a protocol converter for the lane mode,
which would be a code bug. The "potential hardware failure" being masked
is a hallucination.

> [ ... ]
> 
> > +static int lynx_10g_validate(struct phy *phy, enum phy_mode mode, int submode,
> > +			     union phy_configure_opts *opts)
> > +{
> > +	struct lynx_lane *lane = phy_get_drvdata(phy);
> > +	enum lynx_lane_mode lane_mode;
> > +
> > +	if (mode != PHY_MODE_ETHERNET)
> > +		return -EINVAL;
> > +
> > +	lane_mode = phy_interface_to_lane_mode(submode);
> > +	if (!lynx_lane_supports_mode(lane, lane_mode))
> > +		return -EINVAL;
> > +
> > +	if (lynx_lane_restrict_fixed_mode_change(lane, lane_mode))
> > +		return -EINVAL;
> > +
> > +	if (lynx_10g_lane_mode_needs_rcw_override(lane, lane_mode))
> > +		return -EINVAL;
> > +
> > +	return 0;
> > +}
> 
> [Severity: Medium]
> Should validate() explicitly reject transitions from LANE_MODE_UNKNOWN?
> 
> When a lane is initialized to a non-networking protocol, it defaults to
> LANE_MODE_UNKNOWN. This function permits the transition because
> lynx_10g_lane_mode_needs_rcw_override() returns false. However, when
> lynx_10g_set_mode() subsequently calls lynx_10g_lane_disable_pcvt(),
> the underlying get_pccr() function returns -EINVAL for LANE_MODE_UNKNOWN,
> causing set_mode() to fail.
> 
> [ ... ]

It seems reasonable to reject transitions from LANE_MODE_UNKNOWN in
phy_validate(). First and foremost, the lynx_10g_proto_conf[] registers
will most likely not be sufficient to transition a lane into a supported
protocol. Only supported <-> supported protocol changes have been tested
and should work. I will add a test both here and in lynx-28g.

> > +static int lynx_10g_set_mode(struct phy *phy, enum phy_mode mode, int submode)
> > +{
> > +	struct lynx_lane *lane = phy_get_drvdata(phy);
> > +	bool powered_up = lane->powered_up;
> > +	enum lynx_lane_mode lane_mode;
> > +	int err;
> > +
> > +	err = lynx_10g_validate(phy, mode, submode, NULL);
> > +	if (err)
> > +		return err;
> > +
> > +	lane_mode = phy_interface_to_lane_mode(submode);
> > +	/* lynx_10g_validate() already made sure the lane_mode is supported */
> > +
> > +	if (lane_mode == lane->mode)
> > +		return 0;
> > +
> > +	/* If the lane is powered up, put the lane into the halt state while
> > +	 * the reconfiguration is being done.
> > +	 */
> > +	if (powered_up)
> > +		lynx_10g_lane_halt(phy);
> > +
> > +	err = lynx_10g_lane_disable_pcvt(lane, lane->mode);
> > +	if (err)
> > +		goto out;
> > +
> > +	lynx_10g_lane_change_proto_conf(lane, lane_mode);
> > +	lynx_10g_lane_remap_pll(lane, lane_mode);
> > +	WARN_ON(lynx_10g_lane_enable_pcvt(lane, lane_mode));
> 
> [Severity: Medium]
> Does WARN_ON() discard the error code returned by lynx_10g_lane_enable_pcvt()?
> 
> Because WARN_ON() evaluates to a boolean, the actual integer error code is
> ignored. If hardware configuration fails, the system might proceed as if
> successful, incorrectly update lane->mode, and leave the driver in an
> inconsistent state.

Intentional. lynx_10g_lane_enable_pcvt() fails for the same reasons as
lynx_10g_lane_disable_pcvt(), reasons which are not time variant.
We already checked lynx_10g_lane_disable_pcvt() above, so there is no
reason to check lynx_10g_lane_enable_pcvt() too, and complicate the
control flow.

> > +
> > +	lane->mode = lane_mode;
> > +
> > +out:
> > +	if (powered_up)
> > +		lynx_10g_lane_reset(phy);
> > +
> > +	return err;
> > +}
> 
> -- 
> Sashiko AI review Â· https://sashiko.dev/#/patchset/20260528172404.733196-1-vladimir.oltean@nxp.com?part=12