[LEDE-DEV] imx6: fail to start IBSS link

Tim Harvey tharvey at gateworks.com
Fri Apr 7 08:46:39 PDT 2017


On Fri, Apr 7, 2017 at 8:31 AM, Koen Vandeputte
<koen.vandeputte at ncentric.com> wrote:
>
>
> On 2017-04-03 16:03, Tim Harvey wrote:
>
>
> Hi Tim,
>
> Also apologies for the late reply.
> It's also a very busy period here :)
>
> It also took some time to retest this issue list again using todays trunk
> version (4.9.20 kernel)
>
>> When the IMX6 PCIe host controller uses MSI legacy interrupts stop
>> working and thus any card/driver using legacy will not have
>> functioning interrupts. I'm not sure what that list of card/drivers is
>> that require legacy interrupts but I know ath9k is one of them and
>> just verified it doesn't get any interrupts currently on LEDE master
>> with 4.9.
>>
>>
>> You can do the following to hack out the requirement of MSI for the
>> IMX6 PCIe host controller, then disable CONFIG_PCI_MSI is kernel
>> config
>> diff --git a/drivers/pci/dwc/Kconfig b/drivers/pci/dwc/Kconfig
>> index dfb8a69..31cf8ad 100644
>> --- a/drivers/pci/dwc/Kconfig
>> +++ b/drivers/pci/dwc/Kconfig
>> @@ -6,7 +6,6 @@ config PCIE_DW
>>   config PCIE_DW_HOST
>>           bool
>>          depends on PCI
>> -       depends on PCI_MSI_IRQ_DOMAIN
>>           select PCIE_DW
>>
>>   config PCI_DRA7XX
>> @@ -45,7 +44,6 @@ config PCI_IMX6
>>          bool "Freescale i.MX6 PCIe controller"
>>          depends on PCI
>>          depends on SOC_IMX6Q
>> -       depends on PCI_MSI_IRQ_DOMAIN
>>          select PCIEPORTBUS
>>          select PCIE_DW_HOST
>>
> Works perfectly, thanks for investigating!
> ath9k is up & running again.
>
> Although the paths are different than the patch above mentions. -->
> (a/drivers/pci/host/Kconfig)
>
> Would you like me to make a patch specific for LEDE which integrates this?
> If so, please provide me your S-o-B line.
>
>
> Felix,
>
> Would you accept this workaround?
> Without it, using the current LEDE trunk in the imx6 platform will render
> most legacy irq devices useless .. and I think it will take some time for
> the gentlemen upstream to properly fix it.
>

Koen,

Yes, I just realized also the paths were changed recently in mainline.
The IMX6 PCIe maintainer is working on a more appropriate patch which
will allow MSI to be used in cases where it can. Let's wait a couple
of days to see what he comes up with and then I can look at
backporting it. If not, I can rework the patch that 'hacks' MSI off.

>>
>>>> Other issues seen so far compared to kernel 4.4:
>>>> - A simple "reboot" doesn't work.  UART output shows "Reboot failed" and
>>>> the board stalls. Powercycle is needed
>>
>> This can occur on older revision boards where the PMIC is not reset on
>> IMX6 watchdog reset and a watchdog reset (which is what is used on
>> soft reboot) occurs when the CPU is above 800Mhz. Can you provide the
>> serial number of the board you are seeing this on and verify that if
>> you force the cpu to 800mhz (ie userspace cpufreq governor) prior to
>> reset the issue does not occur?
>>
>> The work-around for this is to use the Gateworks System Controller
>> watchdog to restart the board which does a full board power cycle, but
>> I haven't had time to get that driver mainlined yet (and thus have
>> also not submitted it to LEDE/OpenWrt).
>
> I don't think this one is caused by the explanation above for following
> reasons:
> - Test conducted on a GW5200 which runs at 800MHz (never above)
> - Using identical LEDE builds (1 with 4.4 kernel  & 1 with 4.9 kernel),  the
> reboot 100% works on 4.4  and 100% fails on 4.9
>
> I did notice a small delta which aids a bit in this case.
> The watchdog eventually reboots the board (both when issues the "reboot"
> command,  and after using  sysupgrade)
> It's not elegant at all.. but reduces priority slightly as the board doesn't
> get stuck.

I just reproduced it on a GW5200 with 800MHz IMX6DL as well so
something else is going on here.

I'm investigating.

<snip>
>>>>
>>>> General issues in kernels 4.4 & 4.9
>>>> - Even using the latest UBI FS sources + using the Sync option in
>>>> bootarg,
>>>> files can get corrupted on a power cut.  If the corrupted file is a boot
>>>> file .. :)
>>
>> can you point me to documentation on this bootarg, i'm not familiar with
>> it?
>
> Documented here:
> http://www.linux-mtd.infradead.org/doc/ubifs.html#L_sync_semantics
>
>
>
> Enabled using this:
>
> --- a/target/linux/imx6/image/bootscript-ventana
> +++ b/target/linux/imx6/image/bootscript-ventana
> @@ -50,7 +50,7 @@ if itest.s "x${dtype}" == "xnand" ; then
>      mtdparts del rootfs && mtdparts add nand0 - ubi
>      echo "mtdparts:${mtdparts}"
>      setenv fsload ubifsload
> -    setenv root "ubi0:ubi ubi.mtd=2 rootfstype=squashfs,ubifs"
> +    setenv root "ubi0:ubi ubi.mtd=2 rootfstype=squashfs,ubifs
> rootflags=sync"
>  else
>      echo "Booting from block device ${bootdev}..."
>      setenv fsload "${fs}load ${dtype} ${disk}:1"
>
>
>
> You can check if it's running by looking for following line in bootlog:
> [    2.682056] UBIFS: parse sync
>
>
>
> testing this on dozen of boards for weeks reveals that it reduces the
> empty-file issues with ~70%
> It will sacrifice some write speed as it just basically disables write
> caching and flushes all changes to NAND asap.
>

There was a known IMX GPMI NAND issue that could cause corruption
which was fixed in Linux 4.7. I haven't tested 4.9 extensively yet for
this but I can set up a test. Send me as many details as you can as
far as how easy this is to reproduce and what your test setup is like.

>>>>
>>>>
>>>> Other than this it runs pretty stable :)
>>>>
>>> Tim,
>>>
>>> I found 1 more issue on 4.4 & 4.9 kernels:
>>>
>>> https://lists.debian.org/debian-arm/2016/02/msg00000.html
>>>
>>> I'm also seeing this on 4.4 kernel.
>>> It can take up to a few days before it triggers normally, but I have a
>>> setup
>>> running which reproduces this within a few hours.
>>>
>>> I've made a patch which increases the timeout in the FEC driver just for
>>> testing .. but it still occurs causing the port to be disabled suddenly.
>>>
>> I've seen reports of this as well but usually it takes days of
>> activity if/before it happens. The MDIO timeout in FEC is currently
>> 3ms - what did you increase it to and are you certain it makes these
>> issues go away? Perhaps we need to start a discussion about this on
>> linux-net. I'm not clear if an MDIO read timeout should cause an
>> interface to go down (or if some layer should retry). I'm also not
>> clear why an MDIO read would not complete in 3ms.
>
> I did not state that it was solved/better.
>
> For testing, I increased the timeout to 9ms but the issues still pops up.
>
> I fully agree with you that:
> - Each read should 99.99999999% finish with these 3 ms.
> - If for whatever reason it doesn't succeed, the iface should not halt
> operating.
>
> Lets go fishing upstream for this one :)

Agreed - already posted a question to linux-netdev this morning.

Tim



More information about the Lede-dev mailing list