SDHCI long sleep with interrupts off

Thu Dec 17 03:03:10 PST 2015

Am Donnerstag, den 17.12.2015, 11:28 +0100 schrieb David Jander:
> Hi all, 
> 
> I was investigating the source of abnormal irq-latency spikes on an i.MX6
> (ARM) board, and discovered this:
> 
> # tracer: preemptirqsoff
> #
> # preemptirqsoff latency trace v1.1.5 on 4.4.0-rc4+
> # --------------------------------------------------------------------
> # latency: 2068 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:1)
> #    -----------------
> #    | task: mmcqd/0-92 (uid:0 nice:0 policy:0 rt_prio:0)
> #    -----------------
> #  => started at: _raw_spin_lock_irqsave
> #  => ended at:   _raw_spin_unlock_irqrestore
> #
> #
> #                  _------=> CPU#            
> #                 / _-----=> irqs-off        
> #                | / _----=> need-resched    
> #                || / _---=> hardirq/softirq 
> #                ||| / _--=> preempt-depth   
> #                |||| /     delay            
> #  cmd     pid   ||||| time  |   caller      
> #     \   /      |||||  \    |   /         
>  mmcqd/0-92      0d...    1us#: _raw_spin_lock_irqsave
>  mmcqd/0-92      0.n.1 2066us : _raw_spin_unlock_irqrestore
>  mmcqd/0-92      0.n.1 2070us+: trace_preempt_on <-_raw_spin_unlock_irqrestore
>  mmcqd/0-92      0.n.1 2107us : <stack trace>
>  => sdhci_runtime_resume_host
>  => __rpm_callback
>  => rpm_callback
>  => rpm_resume
>  => __pm_runtime_resume
>  => __mmc_claim_host
>  => mmc_blk_issue_rq
>  => mmc_queue_thread
>  => kthread
>  => ret_from_fork
> 
> 2 ms with interrupts disabled!!! To much dismay, I later discovered that this
> isn't even the worst case scenario. I also discovered that this has been in
> the kernel for a long time without a fix (I have tested from 3.17 to 4.4-rc4).
> There has been an attempt by someone to address this back in 2010, but
> apparently it never hit mainline.
> Going through the code in sdhci.c, I found this troublesome code-path:
> 
> sdhci_do_set_ios() {
> 	spin_lock_irqsave(&host->lock, flags);
> 	...
> 	sdhci_reinit() --> sdhci_init() --> sdhci_do_reset() -->
> 		host->ops->reset() --> sdhci_reset()	
> 	...
> 	spin_unlock_irqrestore(&host->lock, flags);
> }
> 
> And in sdhci_reset(), which may be called with held spinlock:
> 
> 	...
> 	/* Wait max 100 ms */
> 	timeout = 100;
> 
> 	/* hw clears the bit when it's done */
> 	while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) {
> 		if (timeout == 0) {
> 			pr_err("%s: Reset 0x%x never completed.\n",
> 				mmc_hostname(host->mmc), (int)mask);
> 			sdhci_dumpregs(host);
> 			return;
> 		}
> 		timeout--;
> 		mdelay(1);
> 	}
> 
> I am wondering: There either must be a reason this hasn't been fixed in such a
> long time, or I am not understanding this correctly, so please enlighten me.
> Before trying a cowboy attempt at "fixing" this, I'd really like to know why am
> I seeing this?
> I mean... how can such a problem get unnoticed and unfixed for so long?
> Will an attempt at fixing this issue even be accepted?
> 
I would like to see the sdhci spinlock killed and replaced by a mutex
for exactly this reason.

That said, your problem is card polling, when no card is present in the
slot. This is most probably caused by CD gpios having the wrong
polarity.

Regards,
Lucas

-- 
Pengutronix e.K.             | Lucas Stach                 |
Industrial Linux Solutions   | http://www.pengutronix.de/  |