[RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo_open

Mon Jul 31 07:08:31 PDT 2017

On 31/07/2017 13:59, Måns Rullgård wrote:

> Mason writes:
> 
>> On 29/07/2017 17:18, Florian Fainelli wrote:
>>
>>> On 07/29/2017 05:02 AM, Mason wrote:
>>>
>>>> I have identified a 100% reproducible flaw.
>>>> I have proposed a work-around that brings this down to 0
>>>> (tested 1000 cycles of link up / ping / link down).
>>>
>>> Can you also try to get help from your HW resources to eventually help
>>> you find out what is going on here?
>>
>> The patch I proposed /is/ based on the feedback from the HW team :-(
>> "Just reset the HW block, and everything will work as expected."
> 
> Nobody is saying a reset won't recover the lockup.  The problem is that
> we don't know what caused it to lock up in the first place.  How do we
> know it can't happen during normal operation?  If we knew the cause, it
> might also be possible to avoid the situation entirely.

How does one prove that something "can't happen during normal operation"?

The "put adapter in loop-back mode so we can send ourselves fake packets"
shenanigans seems completely insane, if you ask me.

Other things make no sense to me, for example in nb8800_dma_stop()
there is a polling loop:

	do {
		mdelay(100);
		nb8800_writel(priv, NB8800_TX_DESC_ADDR, txb->dma_desc);
		wmb();
		mdelay(100);
		nb8800_writel(priv, NB8800_TXC_CR, txcr | TCR_EN);

		mdelay(5500);

		err = readl_poll_timeout_atomic(priv->base + NB8800_RXC_CR,
						rxcr, !(rxcr & RCR_EN),
						1000, 100000);
		printk("err=%d retry=%d\n", err, retry);
	} while (err && --retry);

(It was me who added the delays.)

*Whatever* delays I insert, it always goes 3 times through the loop.

[   29.654492] ++ETH++ gw32 reg=f002610c val=9ecc8000
[   29.759320] ++ETH++ gw32 reg=f0026100 val=005c0aff
[   35.364705] err=-110 retry=5
[   35.467609] ++ETH++ gw32 reg=f002610c val=9ecc8000
[   35.572436] ++ETH++ gw32 reg=f0026100 val=005c0aff
[   41.177822] err=-110 retry=4
[   41.280726] ++ETH++ gw32 reg=f002610c val=9ecc8000
[   41.385553] ++ETH++ gw32 reg=f0026100 val=005c0aff
[   46.890907] err=0 retry=3

How is that possible?

I've tried using spinlocks and delays to get parallel execution
down to a minimum, and have the same logs on both boards.

Regards.