[PATCH v16 1/2] pwm: add microchip soft ip corePWM driver

Tue Apr 11 06:56:15 PDT 2023

Hey Uwe,

On Tue, Apr 11, 2023 at 12:55:47PM +0200, Uwe Kleine-König wrote:
> On Tue, Apr 11, 2023 at 09:56:34AM +0100, Conor Dooley wrote:
> > Add a driver that supports the Microchip FPGA "soft" PWM IP core.
> > 
> > Signed-off-by: Conor Dooley <conor.dooley at microchip.com>
> > ---
> >  drivers/pwm/Kconfig              |  10 +
> >  drivers/pwm/Makefile             |   1 +
> >  drivers/pwm/pwm-microchip-core.c | 509 +++++++++++++++++++++++++++++++
> >  3 files changed, 520 insertions(+)
> >  create mode 100644 drivers/pwm/pwm-microchip-core.c
> > 
> > diff --git a/drivers/pwm/Kconfig b/drivers/pwm/Kconfig
> > index dae023d783a2..f42756a014ed 100644
> > --- a/drivers/pwm/Kconfig
> > +++ b/drivers/pwm/Kconfig
> > @@ -393,6 +393,16 @@ config PWM_MEDIATEK
> >  	  To compile this driver as a module, choose M here: the module
> >  	  will be called pwm-mediatek.
> >  
> > +config PWM_MICROCHIP_CORE
> > +	tristate "Microchip corePWM PWM support"
> > +	depends on SOC_MICROCHIP_POLARFIRE || COMPILE_TEST
> > +	depends on HAS_IOMEM && OF
> > +	help
> > +	  PWM driver for Microchip FPGA soft IP core.
> > +
> > +	  To compile this driver as a module, choose M here: the module
> > +	  will be called pwm-microchip-core.
> > +
> >  config PWM_MXS
> >  	tristate "Freescale MXS PWM support"
> >  	depends on ARCH_MXS || COMPILE_TEST
> > diff --git a/drivers/pwm/Makefile b/drivers/pwm/Makefile
> > index 7bf1a29f02b8..a65625359ece 100644
> > --- a/drivers/pwm/Makefile
> > +++ b/drivers/pwm/Makefile
> > @@ -34,6 +34,7 @@ obj-$(CONFIG_PWM_LPSS_PCI)	+= pwm-lpss-pci.o
> >  obj-$(CONFIG_PWM_LPSS_PLATFORM)	+= pwm-lpss-platform.o
> >  obj-$(CONFIG_PWM_MESON)		+= pwm-meson.o
> >  obj-$(CONFIG_PWM_MEDIATEK)	+= pwm-mediatek.o
> > +obj-$(CONFIG_PWM_MICROCHIP_CORE)	+= pwm-microchip-core.o
> >  obj-$(CONFIG_PWM_MTK_DISP)	+= pwm-mtk-disp.o
> >  obj-$(CONFIG_PWM_MXS)		+= pwm-mxs.o
> >  obj-$(CONFIG_PWM_NTXEC)		+= pwm-ntxec.o
> > diff --git a/drivers/pwm/pwm-microchip-core.c b/drivers/pwm/pwm-microchip-core.c
> > new file mode 100644
> > index 000000000000..0a69ec376c51
> > --- /dev/null
> > +++ b/drivers/pwm/pwm-microchip-core.c
> > @@ -0,0 +1,509 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * corePWM driver for Microchip "soft" FPGA IP cores.
> > + *
> > + * Copyright (c) 2021-2023 Microchip Corporation. All rights reserved.
> > + * Author: Conor Dooley <conor.dooley at microchip.com>
> > + * Documentation:
> > + * https://www.microsemi.com/document-portal/doc_download/1245275-corepwm-hb
> > + *
> > + * Limitations:
> > + * - If the IP block is configured without "shadow registers", all register
> > + *   writes will take effect immediately, causing glitches on the output.
> > + *   If shadow registers *are* enabled, a write to the "SYNC_UPDATE" register
> > + *   notifies the core that it needs to update the registers defining the
> > + *   waveform from the contents of the "shadow registers".
> 
> You only write once to the sync update register (i.e. during probe). So
> that register specifies that a period should be completed before a new
> setting becomes active, right?

Correct.

> Even with sync update this is still racy, right?

I assume the period ticking over as we are updating the values is your
concern here. I'm not sure that there's all that much we can do about
that, so I guess I shall update the comment.
Perhaps writing out period_steps and prescale should be done after
computing the new duty cycle to reduce the possible window since that
may require an expensive division on a 32-bit arch?

> > + * - The IP block has no concept of a duty cycle, only rising/falling edges of
> > + *   the waveform. Unfortunately, if the rising & falling edges registers have
> > + *   the same value written to them the IP block will do whichever of a rising
> > + *   or a falling edge is possible. I.E. a 50% waveform at twice the requested
> > + *   period. Therefore to get a 0% waveform, the output is set the max high/low
> > + *   time depending on polarity.
> > + *   If the duty cycle is 0%, and the requested period is less than the
> > + *   available period resolution, this will manifest as a ~100% waveform (with
> > + *   some output glitches) rather than 50%.
> 
> The last paragraph refers to negedge = 0, posedge = 0 and period_steps =
> 0?

Yes. Although, I did some poking around with it just now & that actually
only happens if prescale is also 0.
If it is non-zero, get to see some other "interesting behaviour" where
the period becomes gigantic - for example @ prescale = 0x3, the period
becomes about a quarter of a second w/ a 50% duty cycle. clk_rate is
62.5 MHz. I'd need to dig out the RTL to justify that one!

I've just gone and made apply() return -EINVAL for this, which the
subsystem does for requests of zero periods.

> > + * - The PWM period is set for the whole IP block not per channel. The driver
> > + *   will only change the period if no other PWM output is enabled.
> > + */
> 
> > +static void mchp_core_pwm_enable(struct pwm_chip *chip, struct pwm_device *pwm,
> > +				 bool enable, u64 period)
> > +{
> > +	struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip);
> > +	u8 channel_enable, reg_offset, shift;
> > +
> > +	/*
> > +	 * There are two adjacent 8 bit control regs, the lower reg controls
> > +	 * 0-7 and the upper reg 8-15. Check if the pwm is in the upper reg
> > +	 * and if so, offset by the bus width.
> > +	 */
> > +	reg_offset = MCHPCOREPWM_EN(pwm->hwpwm >> 3);
> > +	shift = pwm->hwpwm & 7;
> > +
> > +	channel_enable = readb_relaxed(mchp_core_pwm->base + reg_offset);
> > +	channel_enable &= ~(1 << shift);
> > +	channel_enable |= (enable << shift);
> > +
> > +	writel_relaxed(channel_enable, mchp_core_pwm->base + reg_offset);
> > +	mchp_core_pwm->channel_enabled &= ~BIT(pwm->hwpwm);
> > +	mchp_core_pwm->channel_enabled |= enable << pwm->hwpwm;
> > +
> > +	/*
> > +	 * Notify the block to update the waveform from the shadow registers.
> > +	 * The updated values will not appear on the bus until they have been
> > +	 * applied to the waveform at the beginning of the next period.
> > +	 * This is a NO-OP if the channel does not have shadow registers.
> > +	 */
> 
> The code doesn't match the comment. I think that is a relict from the
> times when we thought that a trigger was necessary to update the
> operating settings from the shadow registers?

Yeah, I read this back to myself before sending v15 & thought that it
didn't need to be changed. I think removing the first line should go.

> 
> > +	if (mchp_core_pwm->sync_update_mask & (1 << pwm->hwpwm))
> > +		mchp_core_pwm->update_timestamp = ktime_add_ns(ktime_get(), period);
> > +}
> > +
> > +static void mchp_core_pwm_wait_for_sync_update(struct mchp_core_pwm_chip *mchp_core_pwm,
> > +					       unsigned int channel)
> > +{
> > +	/*
> > +	 * If a shadow register is used for this PWM channel, and iff there is
> > +	 * a pending update to the waveform, we must wait for it to be applied
> > +	 * before attempting to read its state. Reading the registers yields
> > +	 * the currently implemented settings & the new ones are only readable
> > +	 * once the current period has ended.
> > +	 */
> > +
> > +	if (mchp_core_pwm->sync_update_mask & (1 << channel)) {
> > +		ktime_t current_time = ktime_get();
> > +		s64 remaining_ns;
> > +		u32 delay_us;
> > +
> > +		remaining_ns = ktime_to_ns(ktime_sub(mchp_core_pwm->update_timestamp,
> > +						     current_time));
> > +
> > +		/*
> > +		 * If the update has gone through, don't bother waiting for
> > +		 * obvious reasons. Otherwise wait around for an appropriate
> > +		 * amount of time for the update to go through.
> > +		 */
> > +		if (remaining_ns <= 0)
> > +			return;
> > +
> > +		delay_us = DIV_ROUND_UP_ULL(remaining_ns, NSEC_PER_USEC);
> > +		fsleep(delay_us);
> > +	}
> 
> There is no way to query the hardware if there is still an update
> pending, right?

Hah, no. This IP is about as old as I am & appears to have been written
with keeping the FPGA utilisation % to a minimum. No such luxuries!

> Maybe that's possible implicitly by memoizing the
> expected read value? For me the current approach is fine enough though.
> This can be addressed in the future if needed.
> 
> > +static u64 mchp_core_pwm_calc_duty(const struct pwm_state *state, u64 clk_rate,
> > +				   u8 prescale, u8 period_steps)
> > +{
> > +	u64 duty_steps, tmp;
> > +
> > +	/*
> > +	 * Calculate the duty cycle in multiples of the prescaled period:
> > +	 * duty_steps = duty_in_ns / step_in_ns
> > +	 * step_in_ns = (prescale * NSEC_PER_SEC) / clk_rate
> > +	 * The code below is rearranged slightly to only divide once.
> > +	 */
> > +	tmp = (prescale + 1) * NSEC_PER_SEC;
> > +	duty_steps = mul_u64_u64_div_u64(state->duty_cycle, clk_rate, tmp);
> > +
> > +	return duty_steps;
> > +}
> > +
> > +static void mchp_core_pwm_apply_duty(struct pwm_chip *chip, struct pwm_device *pwm,
> > +				     const struct pwm_state *state, u64 duty_steps,
> > +				     u16 period_steps)
> > +{
> > +	struct mchp_core_pwm_chip *mchp_core_pwm = to_mchp_core_pwm(chip);
> > +	u8 posedge, negedge;
> > +	u8 first_edge = 0, second_edge = duty_steps;
> > +
> > +	/*
> > +	 * Setting posedge == negedge doesn't yield a constant output,
> > +	 * so that's an unsuitable setting to model duty_steps = 0.
> > +	 * In that case set the unwanted edge to a value that never
> > +	 * triggers.
> > +	 */
> > +	if (duty_steps == 0)
> > +		first_edge = period_steps + 1;
> > +
> > +	if (state->polarity == PWM_POLARITY_INVERSED) {
> > +		negedge = first_edge;
> > +		posedge = second_edge;
> > +	} else {
> > +		posedge = first_edge;
> > +		negedge = second_edge;
> > +	}
> > +
> > +	writel_relaxed(posedge, mchp_core_pwm->base + MCHPCOREPWM_POSEDGE(pwm->hwpwm));
> > +	writel_relaxed(negedge, mchp_core_pwm->base + MCHPCOREPWM_NEGEDGE(pwm->hwpwm));
> 
> Is this racy with sync update implemented in the firmware? A comment
> about how the sync update is implemented would be good.

Unless this is a different fear of racing, see above.

> > +}
> > +
> > +static int mchp_core_pwm_calc_period(const struct pwm_state *state, unsigned long clk_rate,
> > +				     u16 *prescale, u16 *period_steps)
> > +{
> > +	u64 tmp;
> > +	u32 remainder;
> > +
> > +	/*
> > +	 * Calculate the period cycles and prescale values.
> > +	 * The registers are each 8 bits wide & multiplied to compute the period
> > +	 * using the formula:
> > +	 *           (prescale + 1) * (period_steps + 1)
> > +	 * period = -------------------------------------
> > +	 *                      clk_rate
> > +	 * so the maximum period that can be generated is 0x10000 times the
> > +	 * period of the input clock.
> > +	 * However, due to the design of the "hardware", it is not possible to
> > +	 * attain a 100% duty cycle if the full range of period_steps is used.
> > +	 * Therefore period_steps is restricted to 0xfe and the maximum multiple
> > +	 * of the clock period attainable is (0xff + 1) * (0xfe + 1) = 0xff00
> > +	 *
> > +	 * The prescale and period_steps registers operate similarly to
> > +	 * CLK_DIVIDER_ONE_BASED, where the value used by the hardware is that
> > +	 * in the register plus one.
> > +	 * It's therefore not possible to set a period lower than 1/clk_rate, so
> > +	 * if tmp is 0, abort. Without aborting, we will set a period that is
> > +	 * greater than that requested and, more importantly, will trigger the
> > +	 * neg-/pos-edge issue described in the limitations.
> > +	 */
> > +	tmp = mul_u64_u64_div_u64(state->period, clk_rate, NSEC_PER_SEC);
> > +	if (!tmp)
> > +		return -EINVAL;
> > +
> > +	if (tmp >= MCHPCOREPWM_PERIOD_MAX) {
> > +		*prescale = MCHPCOREPWM_PRESCALE_MAX;
> > +		*period_steps = MCHPCOREPWM_PERIOD_STEPS_MAX;
> > +
> > +		return 0;
> > +	}
> > +
> > +	/*
> > +	 * There are multiple strategies that could be used to choose the
> > +	 * prescale & period_steps values.
> > +	 * Here the idea is to pick values so that the selection of duty cycles
> > +	 * is as finegrain as possible.
> > +	 * This "optimal" value for prescale can be calculated using the maximum
> > +	 * permitted value of period_steps, 0xfe.
> > +	 *
> > +	 *                period * clk_rate
> > +	 * prescale = ------------------------- - 1
> > +	 *            NSEC_PER_SEC * (0xfe + 1)
> > +	 *
> > +	 * However, we are purely interested in the integer upper bound of this
> > +	 * calculation, so this division should be rounded up before subtracting
> > +	 * 1
> > +	 *
> > +	 *  period * clk_rate
> > +	 * ------------------- was precomputed as `tmp`
> > +	 *    NSEC_PER_SEC
> > +	 */
> > +	*prescale = DIV64_U64_ROUND_UP(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX + 1) - 1;
> 
> If state->period * clk_rate is 765000000001 you get tmp = 765 and then
> *prescale = 2. However roundup(765000000001 / (1000000000 * 255)) - 1 is
> 3. The problem here is that you're rounding down in the calculation of
> tmp. Of course this is constructed because 765000000001 is prime, but
> I'm sure you get the point :-)

Hold that thought for a moment..

> Also we know that tmp is < 0xff00, so we don't need a 64 bit division
> here.

Neither here nor below, true.

> > +	/*
> > +	 * Because 0xff is not a permitted value some error will seep into the
> > +	 * calculation of prescale as prescale grows. Specifically, this error
> > +	 * occurs where the remainder of the prescale calculation is less than
> > +	 * prescale.
> > +	 * For small values of prescale, only a handful of values will need
> > +	 * correction, but overall this applies to almost half of the valid
> > +	 * values for tmp.
> > +	 *
> > +	 * To keep the algorithm's decision making consistent, this case is
> > +	 * checked for and the simple solution is to, in these cases,
> > +	 * decrement prescale and check that the resulting value of period_steps
> > +	 * is valid.
> > +	 *
> > +	 * period_steps can be computed from prescale:
> > +	 *                      period * clk_rate
> > +	 * period_steps = ----------------------------- - 1
> > +	 *                NSEC_PER_SEC * (prescale + 1)
> > +	 *
> > +	 */
> > +	div_u64_rem(tmp, (MCHPCOREPWM_PERIOD_STEPS_MAX + 1), &remainder);
> > +	if (remainder < *prescale) {
> > +		u16 smaller_prescale = *prescale - 1;
> > +
> > +		*period_steps = div_u64(tmp, smaller_prescale + 1) - 1;
> > +		if (*period_steps < 255) {
> > +			*prescale = smaller_prescale;
> > +
> > +			return 0;
> > +		}
> > +	}

...so in your prime case above, we would initially compute a prescale
value that is too large, and then wind up hitting the test of the
remainder here, thereby realising that the smaller prescale value is a
better fit?
Perhaps that's not an acceptable way to handle the issue though.

> I don't understand that part. It triggers for tmp = 511. So you prefer
> 
> 	prescale = 1
> 	period_steps = 254
> 
> yielding period = 510 / clkrate over
> 
> 	prescale = 2
> 	period_steps = 170
> 
> yielding 513 / clkrate. I wonder why.

Because 513 > 511 & 254 > 170!
Is the aim not to produce a period that is less than or equal to that
requested? The aim of this driver is to pick a prescale/period_steps
combo that satisfies that constraint, while also trying to maximise the
"finegrainness" of the duty cycle.
The latter should be stated in a comment above.

> Alsot tmp = 511 is the only value
> where this triggers. There is a mistake somewhere (maybe on my side).

It should trigger for any value 255 * n < x < 256 * n, no?
Say for tmp of 767:
*prescale = DIV64_U64_ROUND_UP(767, 254 + 1) - 1 = DIV64_U64_ROUND_UP(3.00784...) - 1 = 3
remainder = 0.00784.. * (254 + 1) = 2

Am I going nuts? Wouldn't be the first time that I've made a hames of
things here, there are 16 versions for a reason after all.

Cheers,
Conor.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-riscv/attachments/20230411/07fd5928/attachment.sig>