[PATCH v2 3/3] mtd: rawnand: Support for sequential cache reads

Thu Jul 20 04:42:03 PDT 2023

Miquel Raynal <miquel.raynal at bootlin.com> writes:

> Hi Måns,
>
> mans at mansr.com wrote on Wed, 19 Jul 2023 14:15:48 +0100:
>
>> Miquel Raynal <miquel.raynal at bootlin.com> writes:
>> 
>> > Hi Måns,
>> >
>> > mans at mansr.com wrote on Wed, 19 Jul 2023 10:26:09 +0100:
>> >  
>> >> Miquel Raynal <miquel.raynal at bootlin.com> writes:
>> >>   
>> >> > Hi Måns,
>> >> >
>> >> > mans at mansr.com wrote on Tue, 18 Jul 2023 15:03:14 +0100:
>> >> >    
>> >> >> Miquel Raynal <miquel.raynal at bootlin.com> writes:
>> >> >>     
>> >> >> > Hi Måns,
>> >> >> >
>> >> >> > mans at mansr.com wrote on Mon, 17 Jul 2023 14:11:31 +0100:
>> >> >> >      
>> >> >> >> Miquel Raynal <miquel.raynal at bootlin.com> writes:
>> >> >> >>       
>> >> >> >> > So, I should have done that earlier but, could you please slow the
>> >> >> >> > whole operation down, just to see if there is something wrong with the
>> >> >> >> > timings or if we should look in another direction.
>> >> >> >> >
>> >> >> >> > Maybe you could add a boolean to flag if the last CMD was a
>> >> >> >> > READCACHESEQ, READCACHESTART or READCACHEEND, and if the flag is
>> >> >> >> > true, please get the jiffies before and after each waitrdy and
>> >> >> >> > delay_ns. Finally, please print the expected delay and the actual one
>> >> >> >> > and compare to see if something was too fast compared to what we
>> >> >> >> > expected.        
>> >> >> >> 
>> >> >> >> Between which points exactly should the delay be measured?  Also, there
>> >> >> >> is no command called READCACHESTART.  Did you mean READSTART or
>> >> >> >> something else?      
>> >> >> >
>> >> >> > Yeah, whatever command is specific to sequential cache reads:
>> >> >> > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L1218
>> >> >> > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L1228      
>> >> >> 
>> >> >> I'm still not sure what exactly you want to me measure.  The waitrdy and
>> >> >> ndelay combined, each separately, or something else?
>> >> >>     
>> >> >
>> >> > I would like to know, how much time we spend waiting in both cases.    
>> >> 
>> >> Which "both" cases?  
>> >
>> > ndelay and more importantly, waitrdy:  
>> 
>> [...]
>> 
>> >> > Is there something wrong with the "wait ready"? As we cannot observe
>> >> > the timings with a scope, because we are using a "soft" controller
>> >> > implementation somehow, we can easily measure how much time we spend
>> >> > in each operation by measuring the time before and after.
>> >> >
>> >> > These information are only useful when we are doing operations related
>> >> > to sequential reads.    
>> >> 
>> >> I have hooked up some spare GPIOs to a scope, which should be more
>> >> accurate (nanosecond) than software timestamps.  All I need to know is
>> >> what to measure and what to look for in those measurements.  
>> >
>> > Great. The only issue with the scope is the fact that we might actually
>> > look at something that is not a faulty sequential read op.  
>> 
>> How exactly do I know which ones are faulty?
>
> Right now I expect all sequential ops to be faulty. As mentioned above,
> I don't think we are interested in all the commands that are sent
> through the NAND bus, but just the READSTART/READCACHESEQ/READCACHEEND
> sequences, see these two ops there, that's what we want to capture:
>
>> >> >> > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L1218
>> >> >> > https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L1228   
>
> That's why a regular scope is not as easy as it sounds to use to
> capture these timings.

I have it set up so it raises one of three GPIOs at the start of
omap_nand_exec_instr() when any of those commands are issued, then a
fourth during the following waitrdy.  After the ndelay(), the pin for
the command is lowered again.  This makes it easy to measure the
duration of the waitrdy as well as any additional delay associated with
each of the commands.

The actual nand chip signals are unfortunately impossible to access.

>> > Unless you hack into the core to perform these in a loop (with a
>> > brutal "while (1)"). But I don't think we require big precision here,
>> > at least as a first step, looking at software timestamps like hinted
>> > above is enough so we can easily identify the different delays and
>> > compare them with nand_timings.c.
>> >
>> > Please use whatever method is easier for you.  
>> 
>> Which values should be compared?
>
> The specification declares minimum and maximum times (see
> nand_timings.c). I want to see if these timings, which are requested by
> the core (links above) are correctly observed or not. The ones that are
> particularly critical because they are different than what the other
> ops use, are the ones around READSTART/READCACHESEQ/READCACHEEND.
> Anything else, you already use them, so it's quite likely that they are
> not problematic. These are new.

I don't think it's quite as simple as these commands being somehow
broken.  The system works for the most part, and these commands are
definitely being used.  The only breakage I notice is that the
MEMGETBADBLOCK ioctl wrongly reports blocks as being bad under some
unclear conditions.  There appears to be some weird interaction between
this ioctl and read() calls.  Whatever the pattern is, it is entirely
predictable in that issuing the same sequence of ioctl and read always
gives the same error pattern.  Using pread instead of read changes the
pattern.

-- 
Måns Rullgård