[BUG] dmaengine: pxa_dma: + mmc: pxamci: race condition with DMA error on tx channel
Petr Cvek
petr.cvek at tul.cz
Tue Mar 7 22:57:08 PST 2017
Hello,
PXA27x DMA changes between:
v4.7
d52bd54db8be8999df6df5a776f38c4f8b5e9cea
and
v4.10-rc5
a4685d2f58e2230d4e27fb2ee581d7ea35e5d046
seems to expose a race condition while using PXA MMC driver on a PXA27x (magician.c machine).
The failure causes one line in the kernel log, after which the filesystem on SD card is inaccessible (and machine too).
mmc0: DMA error on tx channel
I wasn't able to track the problem to a single patch as the problem occurs at random time (from the boot to like a half an hour) and it's maybe dependent on a level of a battery charge (maybe because of kernel log writes of charging messages).
It seems that most occurrency is during writes on an SD card. Using an SDHC card decreases the time to fail. After failure the OS is unavailable (rootfs in on the card).
>From my poking in the kernel source code it seems there is a probability that pxamci_irq() takes longer to call and its subsequent call pxamci_data_done() isn't fast enough to set [1]
host->data = NULL;
>From the DMA side, the DMA done interrupt is generated:
pxad_chan_handler() -> vchan_cookie_complete()
...where a tasklet for vchan_complete() is scheduled, where finally with interrupts enabled (can pxamci_irq() be called here?) the callback pxamci_dma_irq() is called.
>From my tests it seems at this point [2] the host->data is always NULL and rest of the callback is never called. It is called once with a nonempty host->data only just before the failure.
During the testing I put udelay(100) at the start of pxamci_dma_irq() and fail occurred after like 2 hours (when I for the first time tapped the touchscreen - higher CPU usage and interrupts).
[1] http://elixir.free-electrons.com/source/drivers/mmc/host/pxamci.c?v=4.10#L385
[2] http://elixir.free-electrons.com/source/drivers/mmc/host/pxamci.c?v=4.10#L561
Best regards,
Petr
More information about the linux-arm-kernel
mailing list