[LEDE-DEV] imx6: fail to start IBSS link

Fri Apr 7 08:31:12 PDT 2017

On 2017-04-03 16:03, Tim Harvey wrote:


Hi Tim,

Also apologies for the late reply.
It's also a very busy period here :)

It also took some time to retest this issue list again using todays 
trunk version (4.9.20 kernel)

> When the IMX6 PCIe host controller uses MSI legacy interrupts stop
> working and thus any card/driver using legacy will not have
> functioning interrupts. I'm not sure what that list of card/drivers is
> that require legacy interrupts but I know ath9k is one of them and
> just verified it doesn't get any interrupts currently on LEDE master
> with 4.9.
>
>
> You can do the following to hack out the requirement of MSI for the
> IMX6 PCIe host controller, then disable CONFIG_PCI_MSI is kernel
> config
> diff --git a/drivers/pci/dwc/Kconfig b/drivers/pci/dwc/Kconfig
> index dfb8a69..31cf8ad 100644
> --- a/drivers/pci/dwc/Kconfig
> +++ b/drivers/pci/dwc/Kconfig
> @@ -6,7 +6,6 @@ config PCIE_DW
>   config PCIE_DW_HOST
>           bool
>          depends on PCI
> -       depends on PCI_MSI_IRQ_DOMAIN
>           select PCIE_DW
>
>   config PCI_DRA7XX
> @@ -45,7 +44,6 @@ config PCI_IMX6
>          bool "Freescale i.MX6 PCIe controller"
>          depends on PCI
>          depends on SOC_IMX6Q
> -       depends on PCI_MSI_IRQ_DOMAIN
>          select PCIEPORTBUS
>          select PCIE_DW_HOST
>
Works perfectly, thanks for investigating!
ath9k is up & running again.

Although the paths are different than the patch above mentions. --> 
(a/drivers/pci/host/Kconfig)

Would you like me to make a patch specific for LEDE which integrates this?
If so, please provide me your S-o-B line.


Felix,

Would you accept this workaround?
Without it, using the current LEDE trunk in the imx6 platform will 
render most legacy irq devices useless .. and I think it will take some 
time for the gentlemen upstream to properly fix it.

>
>>> Other issues seen so far compared to kernel 4.4:
>>> - A simple "reboot" doesn't work.  UART output shows "Reboot failed" and
>>> the board stalls. Powercycle is needed
> This can occur on older revision boards where the PMIC is not reset on
> IMX6 watchdog reset and a watchdog reset (which is what is used on
> soft reboot) occurs when the CPU is above 800Mhz. Can you provide the
> serial number of the board you are seeing this on and verify that if
> you force the cpu to 800mhz (ie userspace cpufreq governor) prior to
> reset the issue does not occur?
>
> The work-around for this is to use the Gateworks System Controller
> watchdog to restart the board which does a full board power cycle, but
> I haven't had time to get that driver mainlined yet (and thus have
> also not submitted it to LEDE/OpenWrt).
I don't think this one is caused by the explanation above for following 
reasons:
- Test conducted on a GW5200 which runs at 800MHz (never above)
- Using identical LEDE builds (1 with 4.4 kernel  & 1 with 4.9 kernel),  
the reboot 100% works on 4.4  and 100% fails on 4.9

I did notice a small delta which aids a bit in this case.
The watchdog eventually reboots the board (both when issues the "reboot" 
command,  and after using  sysupgrade)
It's not elegant at all.. but reduces priority slightly as the board 
doesn't get stuck.
>>> - UART DMA disabled is required to avoid some boot errors (I've made a
>>> custom backport from your upstream patch fixing this, but not submitted here
>>> yet)
> which boot error specifically? I don't know that I've seen it, but I
> can confirm that UART DMA needs to be disabled for RS485 to work
> (which is a more obscure case) which is why I've done it on our
> kernels. AFAIK there are still some issues upstream with IMX UART
> flow-control and mctrl_gpio.
Retested on 4.9.20 and the prints are gone without any custom patch.
considered as solved.

I'm still in favor to backport your UART DMA patch to fix the RS485 
usecase though.
>>> General issues in kernels 4.4 & 4.9
>>> - Even using the latest UBI FS sources + using the Sync option in bootarg,
>>> files can get corrupted on a power cut.  If the corrupted file is a boot
>>> file .. :)
> can you point me to documentation on this bootarg, i'm not familiar with it?
Documented here: 
http://www.linux-mtd.infradead.org/doc/ubifs.html#L_sync_semantics



Enabled using this:

--- a/target/linux/imx6/image/bootscript-ventana
+++ b/target/linux/imx6/image/bootscript-ventana
@@ -50,7 +50,7 @@ if itest.s "x${dtype}" == "xnand" ; then
      mtdparts del rootfs && mtdparts add nand0 - ubi
      echo "mtdparts:${mtdparts}"
      setenv fsload ubifsload
-    setenv root "ubi0:ubi ubi.mtd=2 rootfstype=squashfs,ubifs"
+    setenv root "ubi0:ubi ubi.mtd=2 rootfstype=squashfs,ubifs 
rootflags=sync"
  else
      echo "Booting from block device ${bootdev}..."
      setenv fsload "${fs}load ${dtype} ${disk}:1"



You can check if it's running by looking for following line in bootlog:
[    2.682056] UBIFS: parse sync



testing this on dozen of boards for weeks reveals that it reduces the 
empty-file issues with ~70%
It will sacrifice some write speed as it just basically disables write 
caching and flushes all changes to NAND asap.

>>>
>>>
>>> Other than this it runs pretty stable :)
>>>
>> Tim,
>>
>> I found 1 more issue on 4.4 & 4.9 kernels:
>>
>> https://lists.debian.org/debian-arm/2016/02/msg00000.html
>>
>> I'm also seeing this on 4.4 kernel.
>> It can take up to a few days before it triggers normally, but I have a setup
>> running which reproduces this within a few hours.
>>
>> I've made a patch which increases the timeout in the FEC driver just for
>> testing .. but it still occurs causing the port to be disabled suddenly.
>>
> I've seen reports of this as well but usually it takes days of
> activity if/before it happens. The MDIO timeout in FEC is currently
> 3ms - what did you increase it to and are you certain it makes these
> issues go away? Perhaps we need to start a discussion about this on
> linux-net. I'm not clear if an MDIO read timeout should cause an
> interface to go down (or if some layer should retry). I'm also not
> clear why an MDIO read would not complete in 3ms.
I did not state that it was solved/better.

For testing, I increased the timeout to 9ms but the issues still pops up.

I fully agree with you that:
- Each read should 99.99999999% finish with these 3 ms.
- If for whatever reason it doesn't succeed, the iface should not halt 
operating.

Lets go fishing upstream for this one :)

> Tim

-- 
Koen Vandeputte - Software Developer
koen.vandeputte at ncentric.com | +32499736158