3.18-rc2 boot failure on Radxa Rock in dw_mmc

Doug Anderson dianders at chromium.org
Sat Nov 1 13:50:16 PDT 2014


Max,

On Sat, Nov 1, 2014 at 7:03 AM, Max Schwarz <max.schwarz at online.de> wrote:
> Hi all,
>
> I'm getting an Oops during boot of 3.18-rc2 on my Radxa Rock. It only happens
> when a microSD card is inserted.
>
> Full boot log: https://gist.github.com/xqms/ff6fc22407e705c9986d
> My .config: https://gist.github.com/xqms/9f80f9f9601f85d399cf
>
> The issue seems to be in the dw-mmc driver. It seems that an IRQ is triggered
> before the host->slot[i] pointer is initialized, causing the fault in line
> 2066 of drivers/mmc/host/dw_mmc.c:
>
> static void dw_mci_work_routine_card(struct work_struct *work)
> {
>         struct dw_mci *host = container_of(work, struct dw_mci, card_work);
>         int i;
>
>         for (i = 0; i < host->num_slots; i++) {
>                 struct dw_mci_slot *slot = host->slot[i];
>                 struct mmc_host *mmc = slot->mmc; <-- FAULT
>                 struct mmc_request *mrq;

Yup.  Sounds familiar.  See
<https://patchwork.kernel.org/patch/4829341/>.  I later decided that I
hadn't thought through everything enough so I ended up dropping the
patch.  There is definitely a race here and it's existed for a long
time, I just wasn't sure my patch was the right fix.  Amazingly the
problem disappeared for me (not that I liked that as a fix).


> A bisect ends at:
>
> commit 51da2240906cb94e8f6ba55e403b6206df6fb2dd
> Author: Yuvaraj CD <yuvaraj.cd at gmail.com>
> Date:   Fri Aug 22 19:17:50 2014 +0530
>
>     mmc: dw_mmc: use mmc_regulator_get_supply to handle regulators
>
>     This patch makes use of mmc_regulator_get_supply() to handle
>     the vmmc and vqmmc regulators.Also it moves the code handling
>     the these regulators to dw_mci_set_ios().It turned on the vmmc
>     and vqmmc during MMC_POWER_UP and MMC_POWER_ON,and turned off
>     during MMC_POWER_OFF.
>
>     Signed-off-by: Yuvaraj Kumar C D <yuvaraj.cd at samsung.com>
>     Signed-off-by: Ulf Hansson <ulf.hansson at linaro.org>
>
> That seems a bit unrelated to me, but maybe it causes a different
> initialization order?
>
> I'm playing around with the initialization order myself, but I haven't managed
> to find a working configuration yet. Can somebody with MMC knowledge find the
> problem or point me into the right direction?

For me the problem randomly came and then randomly disappeared.
Others working on the same kernel would see it, then they wouldn't.

...if you want a fix, try grabbing (5998ad0 mmc: dw_mmc: Remove old
card detect infrastructure) from linuxnext.  That kills the work
routine.  I also threw an "if (!slot)" in there just because I was
worried about the problem coming back and I figured that it was better
not to crash.

Eventually someone needs to fully remove the "slot" support from
DW_MMC (it's already removed from the device tree parsing).  When that
happens hopefully this race will be sorted out fully.

-Doug



More information about the Linux-rockchip mailing list