[PATCH] mmc: failure of block read wait for long time

Tue Sep 14 01:15:22 EDT 2010

Adrian,

[..snip..]
> >>> [Ghorai] Adrian,
> >>> Yes this works and reduced the retry by 1/4 (2048 to 512 times for 1MB
> >> data read) form the original code;
> >>> Initially it was retrying for each page(512 bytes) after multi-block
> >> read fail; but this solution is retying for each segment(2048 bytes);
> >>> 1. Now say block layrer reading 1MB and failed for the 1st segment. So
> >> it will still retry for 1MB/2048-bytes, i.e. 512 times.
> >>> 2. So do you think any good reason to retry again and again?
> >> If you have 1MB that is not readable, it sounds like the card is broken.
> >> Why are so many reads failing?  Has the card been removed?
> >>
> >> You might very rarely see ECC errors in a small number of sectors,
> >> but more than that sounds like something else is broken.
> >
> > [Ghorai] yes, one example is we remove the card when reading data,
> 
> Well, that is a different case.  Once the card has gone, the block driver
> can (and will once the remove method is called) error out all I/O
> requests without sending them to MMC.  That doesn't happen until there
> is a card detect interrupt and a resulting rescan.

[Ghorai] here we are discussing two problem, 
1. If IO failed how to stop retry; because of -
	a. internal card error
	b. issue in Filesystem, driver, or host controller issue
	c. or cards removed.

2. And 2nd how to sync block-layer IO, if card removed,
	a. case 1: when card removed interrupt support by the platform 
	b. case 2: when card removed interrupt does not support by the platform?

> 
> A possible solution is to put a flag on mmc_card to indicate card_gone
> that gets set as soon as the drivers card detect interrupt shows there
> is no card (hoping that we are not getting any bouncing on card detect)
> and then have mmc_wait_for_req() simple return -ENODEV immediately if
> the card_gone flag is set.  Finally, if the mmc block driver sees
> a -ENODEV error, it should also check the card_gone flag (via a new
> core function) and if the card is gone, do not retry - and perhaps
> even error out the rest of the I/O request queue as well.

[Ghorai] your idea address the 2.a case, but not 2.b, 1.a, 1.b

And the solution I was proposing to return the status of IO failure as soon as possible to above layer; and handle the card removed interrupt separately or any other issue in h/w or s/w or combination of both. Or just think again when platform don't have the card remove interrupt.

So my patch addresses the 1st part
And for the 2nd part we can submit the patch anytime.

> 
> I can suggest a patch if you want but I am on vacation next week so
> it will have to wait a couple of weeks.
> 
> > And moreover we should not give the interleave data to apps, as we don't
> have option to tell application for the valid data.
> >
[..snip..]
http://comments.gmane.org/gmane.linux.kernel.mmc/2714

> >