Spurious timeouts in mvmdio

Jason Cooper jason at lakedaemon.net
Tue Dec 3 07:23:46 EST 2013


Nicolas,

Sorry for the delay, we spoke about this yesterday on irc, and
apparently we all thought the other person was going to respond.  oops
:(

On Mon, Dec 02, 2013 at 04:15:54PM +0100, Nicolas Schichan wrote:
> During 3.13-rc1 testing, I have found out that the mvmdio driver
> would report timeouts on the kernel console:
> 
> [   11.011334] orion-mdio orion-mdio: Timeout: SMI busy for too long
> 
> The hardware is a MV88F6281 Kirkwood CPU. The mvmdio driver is using
> the irq line 46 (ge00_err).
> 
> I am inclined to believe that it is due to the fact that
> wait_event_timeout() is called with a timeout parameter of 1 jiffy
> in orion_mdio_wait_ready(). If the timer interrupt ticks right after
> calling wait_event_timeout(), we may end up spending much less time
> than MVMDIO_SMI_TIMEOUT (1 msec) in wait_event_timeout(), and as a
> result report a timeout as the MDIO access did not complete in such
> a short time.
> 
> As to how to fix this, I see two options (I don't know which one
> would be prefered):
> 
> - Option 1: always pass a timeout of at least 2 jiffy to wait_event_timeout().
> - Option 2: switch to wait_event_hrtimeout().
> 
> I can provide patches for both options.

Based on yesterday's irc chat, option 1 sounds good.  Here's the dump
from yesterday where Sebastian provided a thorough explanation:

11:29 < shesselba> increasing max timeout to 2 ticks at least sounds reasonable
11:29 < shesselba> 10ms should be enough for every CONFIG_HZ there is

11:30 < kos_tom> why make the timeout tied to the ticks? there are functions/macros to convert real time numbers into ticks.
11:30 < kos_tom> msecs_to_jiffies() or something

11:31 < shesselba> kos_tom: it is already using usecs_to_jiffies()
11:31 < shesselba> the thing is: 1ms is less than a jiffy

11:33 < kos_tom> so it will wait one jiffy or a little bit more, no?

11:38 < shesselba> no, the spurious timeouts he is seeing come from (1) mvmdio gets jiffies close before the next tick, (2) wait_event_timeout is called with jiffies + timeout
11:39 < shesselba> with timeout << 1 jiffy
11:39 < shesselba> then (3) the next timer tick occurs
11:39 < shesselba> it will end up waiting less then a jiffy
11:40 < shesselba> IOW, increase timeout to be at least two jiffies (or 20ms for CONFIG_HZ=100)
11:41 < shesselba> originally, it was 100ms anyway

Looking forward to the patch!

thx,

Jason.



More information about the linux-arm-kernel mailing list