[RFC PATCH v1] net: ethernet: nb8800: Reset HW block in ndo_open
Mason
slash.tmp at free.fr
Mon Jul 31 07:08:31 PDT 2017
On 31/07/2017 13:59, Måns Rullgård wrote:
> Mason writes:
>
>> On 29/07/2017 17:18, Florian Fainelli wrote:
>>
>>> On 07/29/2017 05:02 AM, Mason wrote:
>>>
>>>> I have identified a 100% reproducible flaw.
>>>> I have proposed a work-around that brings this down to 0
>>>> (tested 1000 cycles of link up / ping / link down).
>>>
>>> Can you also try to get help from your HW resources to eventually help
>>> you find out what is going on here?
>>
>> The patch I proposed /is/ based on the feedback from the HW team :-(
>> "Just reset the HW block, and everything will work as expected."
>
> Nobody is saying a reset won't recover the lockup. The problem is that
> we don't know what caused it to lock up in the first place. How do we
> know it can't happen during normal operation? If we knew the cause, it
> might also be possible to avoid the situation entirely.
How does one prove that something "can't happen during normal operation"?
The "put adapter in loop-back mode so we can send ourselves fake packets"
shenanigans seems completely insane, if you ask me.
Other things make no sense to me, for example in nb8800_dma_stop()
there is a polling loop:
do {
mdelay(100);
nb8800_writel(priv, NB8800_TX_DESC_ADDR, txb->dma_desc);
wmb();
mdelay(100);
nb8800_writel(priv, NB8800_TXC_CR, txcr | TCR_EN);
mdelay(5500);
err = readl_poll_timeout_atomic(priv->base + NB8800_RXC_CR,
rxcr, !(rxcr & RCR_EN),
1000, 100000);
printk("err=%d retry=%d\n", err, retry);
} while (err && --retry);
(It was me who added the delays.)
*Whatever* delays I insert, it always goes 3 times through the loop.
[ 29.654492] ++ETH++ gw32 reg=f002610c val=9ecc8000
[ 29.759320] ++ETH++ gw32 reg=f0026100 val=005c0aff
[ 35.364705] err=-110 retry=5
[ 35.467609] ++ETH++ gw32 reg=f002610c val=9ecc8000
[ 35.572436] ++ETH++ gw32 reg=f0026100 val=005c0aff
[ 41.177822] err=-110 retry=4
[ 41.280726] ++ETH++ gw32 reg=f002610c val=9ecc8000
[ 41.385553] ++ETH++ gw32 reg=f0026100 val=005c0aff
[ 46.890907] err=0 retry=3
How is that possible?
I've tried using spinlocks and delays to get parallel execution
down to a minimum, and have the same logs on both boards.
Regards.
More information about the linux-arm-kernel
mailing list