[RFC PATCH v5 4/4] ice: implement dpll interface to control cgu

Tue Jan 31 05:00:42 PST 2023

Fri, Jan 27, 2023 at 07:13:20PM CET, arkadiusz.kubalewski at intel.com wrote:
>
>>From: Jiri Pirko <jiri at resnulli.us>
>>Sent: Thursday, January 19, 2023 3:54 PM
>>
>>Tue, Jan 17, 2023 at 07:00:51PM CET, vadfed at meta.com wrote:
>>>From: Arkadiusz Kubalewski <arkadiusz.kubalewski at intel.com>

[...]

>>>+/**
>>>+ * ice_dpll_periodic_work - DPLLs periodic worker
>>>+ * @work: pointer to kthread_work structure
>>>+ *
>>>+ * DPLLs periodic worker is responsible for polling state of dpll.
>>>+ */
>>>+static void ice_dpll_periodic_work(struct kthread_work *work)
>>>+{
>>>+	struct ice_dplls *d = container_of(work, struct ice_dplls,
>>>work.work);
>>>+	struct ice_pf *pf = container_of(d, struct ice_pf, dplls);
>>>+	struct ice_dpll *de = &pf->dplls.eec;
>>>+	struct ice_dpll *dp = &pf->dplls.pps;
>>>+	int ret = 0;
>>>+
>>>+	if (!test_bit(ICE_FLAG_DPLL, pf->flags))
>>
>>Why do you need to check the flag there, this would should not be
>>ever scheduled in case the flag was not set.
>>
>
>It's here rather for stopping the worker, It shall no longer reschedule and
>bail out.

How that can happen?

>
>>
>>>+		return;
>>>+	mutex_lock(&d->lock);
>>>+	ret = ice_dpll_update_state(&pf->hw, de);
>>>+	if (!ret)
>>>+		ret = ice_dpll_update_state(&pf->hw, dp);
>>>+	if (ret) {
>>>+		d->cgu_state_acq_err_num++;
>>>+		/* stop rescheduling this worker */
>>>+		if (d->cgu_state_acq_err_num >
>>>+		    CGU_STATE_ACQ_ERR_THRESHOLD) {
>>>+			dev_err(ice_pf_to_dev(pf),
>>>+				"EEC/PPS DPLLs periodic work disabled\n");
>>>+			return;
>>>+		}
>>>+	}
>>>+	mutex_unlock(&d->lock);
>>>+	ice_dpll_notify_changes(de);
>>>+	ice_dpll_notify_changes(dp);
>>>+
>>>+	/* Run twice a second or reschedule if update failed */
>>>+	kthread_queue_delayed_work(d->kworker, &d->work,
>>>+				   ret ? msecs_to_jiffies(10) :
>>>+				   msecs_to_jiffies(500));
>>>+}

[...]

>>>+/**
>>>+ * ice_dpll_rclk_find_dplls - find the device-wide DPLLs by clock_id
>>>+ * @pf: board private structure
>>>+ *
>>>+ * Return:
>>>+ * * 0 - success
>>>+ * * negative - init failure
>>>+ */
>>>+static int ice_dpll_rclk_find_dplls(struct ice_pf *pf)
>>>+{
>>>+	u64 clock_id = 0;
>>>+
>>>+	ice_generate_clock_id(pf, &clock_id);
>>>+	pf->dplls.eec.dpll = dpll_device_get_by_clock_id(clock_id,
>>
>>I have to say I'm a bit lost in this code. Why exactly do you need this
>>here? Looks like the pointer was set in ice_dpll_init_dpll().
>>
>>Or, is that in case of a different PF instantiating the DPLL instances?
>
>Yes it is, different PF is attaching recovered clock pins with this.
>
>>If yes, I'm pretty sure what it is wrong. What is the PF which did
>>instanticate those unbinds? You have to share the dpll instance,
>>refcount it.
>>
>
>It will break, as in our case only one designated PF controls the dpll.

You need to fix this then.

>
>>Btw, you have a problem during init as well, as the order matters. What
>>if the other function probes only after executing this? You got -EFAULT
>>here and bail out.
>>
>
>We don't expect such use case, altough I see your point, will try to fix it.

What? You have to be kidding me, correct? User obviously should have
free will to use sysfs to bind/unbind the PCI devices in any order he
pleases.

>
>>In mlx5, I also share one dpll instance between 2 PFs. What I do is I
>>create mlx5-dpll instance which is refcounted, created by first probed
>>PF and removed by the last one. In mlx5 case, the PFs are equal, nobody
>>is an owner of the dpll. In your case, I think it is different. So
>>probably better to implement the logic in driver then in the dpll core.
>>
>
>For this NIC only one PF will control the dpll, so there is a designated owner.
>Here owner must not only initialize a dpll but also register its pin,
>so the other PFs could register additional pins. Basically it means
>for ice that we can only rely on some postponed rclk initialization for
>a case of unordered PF initialization. Seems doable.

My point is, you should have one DPLL instance shared for muptiple PFs.
Then, you have pin struct and dpll struct to use in pin_register and you
can avoid this odd description magic which is based obviously on broken
model you have.

>
>>Then you don't need dpll_device_get_by_clock_id at all. If you decide to
>>implement that in dpll core, I believe that there should be some
>>functions like:
>>dpll = dpll_device_get(ops, clock_id, ...)  - to create/get reference
>>dpll_device_put(dpll)                       - to put reference/destroy
>
>Sure, we can rename "dpll_device_get_by_clock_id" to "dpll_device_get" (as
>it is only function currently exported for such behavior), and add
>"dpll_device_put", with ref counts as suggested.
>
>>
>>First caller of dpll_device_get() actually makes dpll to instantiate the
>>device.
>>
>
>Maybe I am missing something.. do you suggest that "dpll_device_get" would
>allocate dpll device and do ref count, and then dpll_device_register(..) call

No need for separate register, is it? just have one dpll_device_get()
function allocate-register/getref for you. Why do you need anything else?

>would assign all the arguments that are available only in the owner instance?
>Or i got it wrong?

[...]