sdhci timeout on imx8mq

Lucas Stach l.stach at pengutronix.de
Tue Jan 5 10:06:49 EST 2021


Hi all,

Am Mittwoch, dem 08.07.2020 um 01:32 +0000 schrieb BOUGH CHEN:
> > -----Original Message-----
> > From: Fabio Estevam [mailto:festevam at gmail.com]
> > Sent: 2020年7月7日 20:45
> > To: Angus Ainslie <angus at akkea.ca>
> > Cc: BOUGH CHEN <haibo.chen at nxp.com>; Ulf Hansson
> > <ulf.hansson at linaro.org>; Guido Günther <agx at sigxcpu.org>; linux-
> > mmc
> > <linux-mmc at vger.kernel.org>; Adrian Hunter
> > <adrian.hunter at intel.com>;
> > dl-linux-imx <linux-imx at nxp.com>; Sascha Hauer <
> > kernel at pengutronix.de>;
> > moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
> > <linux-arm-kernel at lists.infradead.org>
> > Subject: Re: sdhci timeout on imx8mq
> > 
> > Hi Angus,
> > 
> > On Tue, Jun 30, 2020 at 4:39 PM Angus Ainslie <angus at akkea.ca>
> > wrote:
> > 
> > > Has there been any progress with this. I'm getting this on about
> > > 50%
> > > of
> > 
> > Not from my side, sorry.
> > 
> > Bough,
> > 
> > Do you know why this problem affects the imx8mq-evk versions that
> > are
> > populated with the Micron eMMC and not the ones with Sandisk eMMC?
> 
> Hi Angus,
> 
> Can you show me the full fail log? I do not meet this issue on my
> side, besides, which kind of uboot do you use?

I was finally able to bisect this issue, which wasn't that much fun due
to the issue not being reproducible 100%. :/ Turns out that the issue
is even more interesting than I thought and likely doesn't have
anything to do with SDHCI or used bootloader versions. Here's my
current debugging state:

I've bisected the issue down to b04383b6a558 (clk: imx8mq: Define gates
for pll1/2 fixed dividers). The change itself looks fine to me, still
CC'ed Leonard for good measure.

In my testing the following partial revert fixes the issue:

--- a/drivers/clk/imx/clk-imx8mq.c
+++ b/drivers/clk/imx/clk-imx8mq.c
@@ -365,7 +365,7 @@ static int imx8mq_clocks_probe(struct platform_device *pdev)
        hws[IMX8MQ_SYS1_PLL_133M_CG] = imx_clk_hw_gate("sys1_pll_133m_cg", "sys1_pll_out", base + 0x30, 15);
        hws[IMX8MQ_SYS1_PLL_160M_CG] = imx_clk_hw_gate("sys1_pll_160m_cg", "sys1_pll_out", base + 0x30, 17);
        hws[IMX8MQ_SYS1_PLL_200M_CG] = imx_clk_hw_gate("sys1_pll_200m_cg", "sys1_pll_out", base + 0x30, 19);
-       hws[IMX8MQ_SYS1_PLL_266M_CG] = imx_clk_hw_gate("sys1_pll_266m_cg", "sys1_pll_out", base + 0x30, 21);
        hws[IMX8MQ_SYS1_PLL_400M_CG] = imx_clk_hw_gate("sys1_pll_400m_cg", "sys1_pll_out", base + 0x30, 23);
        hws[IMX8MQ_SYS1_PLL_800M_CG] = imx_clk_hw_gate("sys1_pll_800m_cg", "sys1_pll_out", base + 0x30, 25);
 
@@ -375,7 +375,7 @@ static int imx8mq_clocks_probe(struct platform_device *pdev)
        hws[IMX8MQ_SYS1_PLL_133M] = imx_clk_hw_fixed_factor("sys1_pll_133m", "sys1_pll_133m_cg", 1, 6);
        hws[IMX8MQ_SYS1_PLL_160M] = imx_clk_hw_fixed_factor("sys1_pll_160m", "sys1_pll_160m_cg", 1, 5);
        hws[IMX8MQ_SYS1_PLL_200M] = imx_clk_hw_fixed_factor("sys1_pll_200m", "sys1_pll_200m_cg", 1, 4);
-       hws[IMX8MQ_SYS1_PLL_266M] = imx_clk_hw_fixed_factor("sys1_pll_266m", "sys1_pll_266m_cg", 1, 3);
+       hws[IMX8MQ_SYS1_PLL_266M] = imx_clk_hw_fixed_factor("sys1_pll_266m", "sys1_pll_out", 1, 3);
        hws[IMX8MQ_SYS1_PLL_400M] = imx_clk_hw_fixed_factor("sys1_pll_400m", "sys1_pll_400m_cg", 1, 2);
        hws[IMX8MQ_SYS1_PLL_800M] = imx_clk_hw_fixed_factor("sys1_pll_800m", "sys1_pll_800m_cg", 1, 1);

The sys1_pll_266m is the parent of nand_usdhc_bus. I've validated that
the SDHCI driver properly enables this bus clock across the problematic
card access. So what I think is happening here is that both
nand_usdhc_bus and sys1_pll_266m are initially enabled. Sometime during
boot sys1_pll_266m gets disabled due to runtime PM on the enet_axi
clock, which is a direct child of sys1_pll_266m. At this point
nand_usdhc_bus is still enabled, but no consumer has claimed the clock
yet, so the parent clock gets disabled while this branch of the clock
tree is still active.

The reference manual states about this situation: "For any clock, its
source must be left on when it is kept on. Behavior is undefined if
this rule is violated."
And it seems this is exactly what's happening here: some kind of glitch
is introduced in the nand_usdhc_bus clock, which prevents the SDHCI
controller from working, even though the clock branch is properly
enabled later on. On my system the SDHCI timeout and following runtime
suspend/resume cycle on the nand_usdhc_bus clock seem to get it back
into a working state.

So I think we need some solution at the clock driver/framework level to
prevent shutting down parent clocks that have active branches, even if
those branches aren't claimed by a consumer (yet).

Regards,
Lucas




More information about the linux-arm-kernel mailing list