[PATCH v5 3/4] pwm: add microchip soft ip corePWM driver

Mon Jul 11 07:33:30 PDT 2022

Hey Uwe, took another look at this today.
I'll give you some more info on the sync_update hw when I send
the next version.

On 09/07/2022 17:02, Uwe Kleine-König wrote:
> Hello Conor,
> 
> On Fri, Jul 08, 2022 at 03:39:22PM +0100, Conor Dooley wrote:
>> Add a driver that supports the Microchip FPGA "soft" PWM IP core.
>>
>> Signed-off-by: Conor Dooley <conor.dooley at microchip.com>
>> ---

>> +static int mchp_core_pwm_apply_period(struct pwm_chip *chip, const struct pwm_state *state,
>> +				      u8 *prescale, u8 *period_steps)
>> +{
>> +	struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip);
>> +	u64 tmp, clk_rate;
>> +	u16 prescale_val, period_steps_val;
>> +
>> +	/*
>> +	 * Calculate the period cycles and prescale values.
>> +	 * The registers are each 8 bits wide & multiplied to compute the period
>> +	 * using the formula:
>> +	 * (clock_period) * (prescale + 1) * (period_steps + 1)
>> +	 * so the maximum period that can be generated is 0x10000 times the
>> +	 * period of the input clock.
>> +	 * However, due to the design of the "hardware", it is not possible to
>> +	 * attain a 100% duty cycle if the full range of period_steps is used.
>> +	 * Therefore period_steps is restricted to 0xFE and the maximum multiple
>> +	 * of the clock period attainable is 0xFF00.
>> +	 */
>> +	clk_rate = clk_get_rate(mchp_core_pwm->clk);
> 
> +	/* If clk_rate is too big, the following multiplication might overflow */

I expanded this comment slightly to:
/*
* If clk_rate is too big, the following multiplication might overflow.
* However this is implausible, as the fabric of current FPGAs cannot
* provide clocks at a rate high enough.
*/

>> +	if (clk_rate >= NSEC_PER_SEC)
>> +		return -EINVAL;
>> +
>> +	tmp = mul_u64_u64_div_u64(state->period, clk_rate, NSEC_PER_SEC);
>> +
>> +	if (tmp >= MCHPCOREPWM_PERIOD_MAX) {
>> +		*prescale = MCHPCOREPWM_PRESCALE_MAX - 1;
>> +		*period_steps = MCHPCOREPWM_PERIOD_STEPS_MAX - 1;
>> +		goto write_registers;
>> +	}
>> +
>> +	for (prescale_val = 1; prescale_val <= MCHPCOREPWM_PRESCALE_MAX; prescale_val++) {
>> +		period_steps_val = div_u64(tmp, prescale_val);
>> +		if (period_steps_val > MCHPCOREPWM_PERIOD_STEPS_MAX)
>> +			continue;
>> +		*period_steps = period_steps_val - 1;
>> +		*prescale = prescale_val - 1;
>> +		break;
>> +	}
> 
> OK, so you want to find the smallest prescale_val such that
> 
> 	prescale_val * MCHPCOREPWM_PERIOD_STEPS_MAX >= tmp
> 
> . You can calculate that without a loop as:
> 
> 	prescale_val = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX);

Hmm, not quite right. For tmp < MCHPCOREPWM_PERIOD_STEPS_MAX this gives
zero. It would have to be:
*prescale = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX);

In which case, the loop collapses to:

*prescale = div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX);
/* PREG_TO_VAL() can produce a value larger than UINT8_MAX */
*period_steps = div_u64(tmp, PREG_TO_VAL((u32) *prescale)) - 1;

and the interim _val variables can just be deleted.

>> +static int mchp_core_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
>> +			       const struct pwm_state *state)
>> +{
>> +	struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip);
>> +	struct pwm_state current_state = pwm->state;
>> +	bool period_locked;
>> +	u64 period;
>> +	u16 channel_enabled;
>> +	u8 prescale, period_steps;
>> +	int ret;
>> +
>> +	if (!state->enabled) {
>> +		mchp_core_pwm_enable(chip, pwm, false);
>> +		return 0;
>> +	}
>> +
>> +	/*
>> +	 * If the only thing that has changed is the duty cycle or the polarity,
>> +	 * we can shortcut the calculations and just compute/apply the new duty
>> +	 * cycle pos & neg edges
>> +	 * As all the channels share the same period, do not allow it to be
>> +	 * changed if any other channels are enabled.
>> +	 */
>> +	spin_lock(&mchp_core_pwm->lock);
>> +
>> +	channel_enabled = (((u16)readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(1)) << 8) |
>> +		readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(0)));
>> +	period_locked = channel_enabled & ~(1 << pwm->hwpwm);
>> +
>> +	if ((!current_state.enabled || current_state.period != state->period) && !period_locked) {
>> +		ret = mchp_core_pwm_apply_period(chip, state, &prescale, &period_steps);
>> +		if (ret) {
>> +			spin_unlock(&mchp_core_pwm->lock);
>> +			return ret;
>> +		}
>> +	} else {
>> +		prescale = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE);
>> +		period_steps = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PERIOD);
>> +	}
>> +
>> +	/*
>> +	 * If the period is locked, it may not be possible to use a period less
>> +	 * than that requested.
>> +	 */
>> +	period = PREG_TO_VAL(period_steps) *  PREG_TO_VAL(prescale) * NSEC_PER_SEC;
> 
> s/  / /
> 
>> +	do_div(period, clk_get_rate(mchp_core_pwm->clk));
>> +	if (period > state->period) {
>> +		spin_unlock(&mchp_core_pwm->lock);
>> +		return -EINVAL;
>> +	}
> 
> I would consider it easier to do the following (in pseudo syntax):
> 
> 
> 	prescale, period_steps = calculate_hwperiod(period);
> 
> 	if (period_locked):
> 		hwprescale = readb_relaxed(PRESCALE)
> 		hwperiod_steps = readb_relaxed(PERIOD)
> 
> 		if period_steps * prescale < hwperiod_steps * hwprescale:
> 			return -EINVAL
> 		else
> 			prescale, period_steps = hwprescale,
> 			hwperiod_steps

I think I like this method more than messing around with the clks.
I'll change to something like this for the next version.

> 	duty_steps = calculate_hwduty(duty, prescale)
> 	if (duty_steps > period_steps)
> 		duty_steps = period_steps

> 
>> +	mchp_core_pwm_apply_duty(chip, pwm, state, prescale, period_steps);
>> +
>> +	/*
>> +	 * Notify the block to update the waveform from the shadow registers.
>> +	 * The updated values will not appear on the bus until they have been
>> +	 * applied to the waveform at the beginning of the next period. We must
>> +	 * write these registers and wait for them to be applied before calling
>> +	 * enable().
>> +	 */
>> +	if (mchp_core_pwm->sync_update_mask & (1 << pwm->hwpwm)) {
>> +		writel_relaxed(1U, mchp_core_pwm->base + MCHPCOREPWM_SYNC_UPD);
>> +		usleep_range(state->period, state->period * 2);
>> +	}
>> +
>> +	spin_unlock(&mchp_core_pwm->lock);
>> +
>> +	mchp_core_pwm_enable(chip, pwm, true);
> 
> I already asked in the last round: Do you really need to write the
> SYNC_UPD register twice? I would expect that you don't?!
> 
> Also the locking looks fishy here. It would be simpler (and maybe even
> more robust, didn't think deeply about it) to assume in
> mchp_core_pwm_enable() that the caller holds the lock. Then you only
> grab the lock once during .apply() and nothing strange can happen in the
> gap.

I got it into my head that enable() could be called by the framework.
I'll simplify the locking here.

> I'd take the lock here to be sure to get a consistent return value.
> 
>> +	channel_enabled = (((u16)readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(1)) << 8) |
>> +		readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_EN(0)));
> 
> micro optimisation: You're reading two register values here, but only use
> one. Shadowing the enabled registers in mchp_core_pwm might also be an
> idea.

I'd agree, but more from the perspective of how awful I feel this code
looks.

> 
>> +	if (channel_enabled & 1 << pwm->hwpwm)
> 
> I'm always unsure about the associativity of & and <<, so I would have
> written that as
> 
> 	if (channel_enabled & (1 << pwm->hwpwm))
> 
> I just tested that for the umpteens time and your code is fine, so this
> is only for human readers like me.

I'll change it, I'll prob have forgotten the associativity by the time I
look at the driver next.

> 
>> +		state->enabled = true;
>> +	else
>> +		state->enabled = false;
>> +
>> +	prescale = PREG_TO_VAL(readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE));
>> +
>> +	posedge = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_POSEDGE(pwm->hwpwm));
>> +	negedge = readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_NEGEDGE(pwm->hwpwm));
>> +
>> +	duty_steps = abs((s16)posedge - (s16)negedge);
> 
> If duty_steps == 0 the returned result is wrong. I suggest to fix that,
> at least mention the problem in a comment.

I think handling it is the way to go.

> 
>> +	state->duty_cycle = duty_steps * prescale * NSEC_PER_SEC;
> 
> Can this overflow?

No, 255 * 256 * 1E9 < 2^64 but ...

>> +	prescale = PREG_TO_VAL(readb_relaxed(mchp_core_pwm->base + MCHPCOREPWM_PRESCALE));

... can.

>>> What is the typical return value of clk_get_rate(mchp_core_pwm->clk)?
>>
>> It's gonna be less than 600M
>
> An exact value would be interesting, then when I spot a rounding problem
> I could give you a test case to double check.

The maximum depends on speed grade, but no more than 200 MHz.

>>>> You need to round up here. Did you test your driver with PWM_DEBUG on?
>>>> During test please do for a few fixed periods:
>>>>
>>>> 	for duty_cycle in [0 .. period]:
>>>> 		pwm_apply(mypwm, {.period = period, .duty_cycle = duty_cycle, .enabled = true, ...})
>>>>
>>>> 	for duty_cycle in [period .. 0]:
>>>> 		pwm_apply(mypwm, {.period = period, .duty_cycle = duty_cycle, .enabled = true, ...})
>>>>
>>>> and check there is no output claiming a miscalculation.
>>>
>>> I ran the stuff you gave me last time, doing something similar w/ a
>>> shell loop. Got no reported miscalculations.
>>
>> I'm surprise, I would have expected that my test recipe would find such
>> an issue. Could you follow my arguing about the rounding direction?
>> There always the possibility that I'm wrong, too.
>
> I'll take another look & get back to you.

I *think* I might've spelt "PWM_DEBUG" as "PWM_DEBUF"...

I'll retest!

Thanks,
Conor.