5.11.0-rc1+: "Division by zero in kernel." when writing to spidev

Thu Jan 7 20:10:44 EST 2021

On Thu, 7 Jan 2021 15:35:46 +0000, Mark Brown <broonie at kernel.org> wrote:
> Copying in a bunch of people for that driver.

Thanks.
I added Tudor Ambarus as well as my debugging points in the direction
of:
commit 9326e4f1e5dd1a4410c429638d3c412b6fc17040 (spi/for-5.10)
Author: Tudor Ambarus <tudor.ambarus at microchip.com>
Date:   Wed Dec 9 19:35:14 2020 +0200

    spi: Limit the spi device max speed to controller's max speed

    Make sure the max_speed_hz of spi_device does not override
    the max_speed_hz of controller.

> I'd not expect this to be required with a polled path, it's only needed
> if the transfer function returns a positive value indicating that the
> transfer is still in progress which shouldn't be the case when things
> are polled.

Oh, indeed. And the driver is indeed returning 0 in this case.

> > I'll continue poking around later (especially checking computed timeout values),
> > should I submit patches for s/complete/spi_finalize_current_transfer/ ?  

Reverting 
    spi: Limit the spi device max speed to controller's max speed
fixes the timeout.

Which brings be back to devicetree:
- the controller node does not define the max clock rate
- the device I add in my overlay does define one
So spi->max_speed_hz is initially non-zero, but
spi->controller->max_speed_hz is zero, then:
-       if (!spi->max_speed_hz)
+       if (!spi->max_speed_hz ||
+           spi->max_speed_hz > spi->controller->max_speed_hz)
                spi->max_speed_hz = spi->controller->max_speed_hz;
it becomes zero in 5.11.
Then, because I do not specify a transfer speed in my userland code:
                if (!xfer->speed_hz)
                        xfer->speed_hz = spi->max_speed_hz;
it gets zero instead of the (as of 5.10) 4MHz, which led to my original
division-by-zero issue.

What would be the correct fix ?
- declare controller max clock ?
  If so, I think spi.c should warn about this missing value.
- do not cap transfer speed to controller speed when the latter is
  zero ?
- both ?

My working out, for whatever it's worth:
I ran my userland code under "strace -tt -T" and get:
00:31:21.449837 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4000) = -1 ETIMEDOUT (Connection timed out) <0.885552>
Given the previous divide-by-zero issue, spi.c is now assuming 100kHz,
which for this 4000 bytes buffer means:
  >>> ((8*1000*4000)/100000) * 2 + 200
  844.96
milliseconds, which roughly matches the call duration measured by
strace.
But this device is declared in my DT as supporting transfer rates up to
4MHz (so period of 250ns), which would mean an actual transfer duration
of:
  >>> 4000*8*250e-9
  0.008
assuming bytes being sent back-to-back (which I believe is the case for
SPI, but worst case I do not expect this to multiply the transfer time
by 100...). So there is something wrong with the transfer speed.

> Probably send the completion patch, yes.

Done.

Regards,
-- 
Vincent Pelletier
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: Signature digitale OpenPGP
URL: <http://lists.infradead.org/pipermail/linux-rpi-kernel/attachments/20210108/19e30701/attachment.sig>