Numonyx NOR and chip->mutex bug?

Mon Feb 7 14:04:06 EST 2011

On Feb 7, 2011, at 12:08 PM, Stefan Bigler wrote:

> Hi Mike
> 
> I attached a patch for you adding some printk, with those already the creation of the first ubivolume goes wrong.
> Is this also the case in your setup?

Greetings and thanks,

Unfortunately, as I've seen before, adding printks tends to make my problem go away, particularly right after the 0xd0 erase resume command. That's the same place that I also found calls like udelay(20) and (void) map_read() of status also hide the failure. (On a lark, I just tried doing that print just before the resume command but it made no difference.)

To avoid masking the failure I think I need to collect information into a static buffer and printk the buffered data only on the error path. That way, the timing of the code hopefully won't change enough to hide the problem. That's more complicated to do, of course.

More as I find it.

-Mike

> Subject: [PATCH 2/2] MTD: cfi_cmdset_0001 driver add tracing
> 
> This tracing force in my case a race condition. When I create
> a ubivolume.
> 
> Signed-off-by: Stefan Bigler <stefan.bigler at keymile.com>
> ---
> drivers/mtd/chips/cfi_cmdset_0001.c |   15 +++++++++++++++
> 1 files changed, 15 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/mtd/chips/cfi_cmdset_0001.c b/drivers/mtd/chips/cfi_cmdset_0001.c
> index d3b2cd3..ea59726 100644
> --- a/drivers/mtd/chips/cfi_cmdset_0001.c
> +++ b/drivers/mtd/chips/cfi_cmdset_0001.c
> @@ -37,6 +37,8 @@
> #include <linux/mtd/compatmac.h>
> #include <linux/mtd/cfi.h>
> 
> +#include <linux/syscalls.h>
> +
> /* #define CMDSET0001_DISABLE_ERASE_SUSPEND_ON_WRITE */
> /* #define CMDSET0001_DISABLE_WRITE_SUSPEND */
> 
> @@ -799,6 +801,8 @@ static int chip_ready (struct map_info *map, struct flchip *chip, unsigned long
> 
>         /* Erase suspend */
>         map_write(map, CMD(0xB0), adr);
> +        printk("[%10u][%04ld] erase suspend 1         adr=0x%08lx\n",
> +               jiffies_to_usecs(jiffies)/1000, sys_gettid(), adr);
> 
>         /* If the flash has finished erasing, then 'erase suspend'
>          * appears to make some (28F320) flash devices switch to
> @@ -1012,6 +1016,10 @@ static void put_chip(struct map_info *map, struct flchip *chip, unsigned long ad
>            do. */
>         map_write(map, CMD(0xd0), adr);
>         map_write(map, CMD(0x70), adr);
> +
> +        printk("[%10u][%04ld] erase resumed 2b        adr=0x%08lx\n",
> +               jiffies_to_usecs(jiffies)/1000, sys_gettid(), adr);
> +
>         chip->oldstate = FL_READY;
>         chip->state = FL_ERASING;
>         break;
> @@ -1884,6 +1892,10 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip,
>  retry:
>     mutex_lock(&chip->mutex);
>     ret = get_chip(map, chip, adr, FL_ERASING);
> +
> +    printk("[%10u][%04ld] do_erase_oneblock start adr=0x%08lx len=0x%x\n",
> +           jiffies_to_usecs(jiffies)/1000, sys_gettid(), adr, len);
> +
>     if (ret) {
>         mutex_unlock(&chip->mutex);
>         return ret;
> @@ -1953,6 +1965,9 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip,
> 
>     xip_enable(map, chip, adr);
>  out:    put_chip(map, chip, adr);
> +    printk("[%10u][%04ld] do_erase_oneblock end   adr=0x%08lx len=0x%x \n",
> +        jiffies_to_usecs(jiffies)/1000, sys_gettid(), adr, len);
> +
>     mutex_unlock(&chip->mutex);
>     return ret;
> }
> -- 
> 1.7.0.5
> 
> Am 02/07/2011 05:46 PM, schrieb Michael Cashwell:
>> On Feb 7, 2011, at 11:22 AM, Joakim Tjernlund wrote:
>> 
>>> Michael Cashwell<mboards at prograde.net>  wrote on 2011/02/07 16:46:22:
>>> 
>>>> My current workaround from my problem is to do a throw-away status read "(void) map_read(map, addr);" after that 0x50, 0xd0, 0x70 sequence. Since no one else is seeing my problem I expect it's some issue with my specific batch of chips. Ugh.
>>> 
>>> Possibly, or an accident waiting to happen to the rest of us.
>> 
>> That does worry me too.
>> 
>>> The map_read will probably force some HW completion. Perhaps some sync() op. will do the same? Just to nail it down.
>> 
>> Once your patch is handled I will continue to try to fully explain my issue. I'm not giving up yet.
>> 
>>> BTW, do you have CONFIG_MTD_COMPLEX_MAPPINGS=y ? I do
>> 
>> Interesting. I have used that on a different platform where some of the high-order address lines were GPIOs.
>> 
>> But no, I'm not using that in this case. Just a simple linear mapping of the whole part.
>> 
>> -Mike
>> 
> ______________________________________________________
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/