[PATCH v2 1/1] mtd: rawnand: Cache read sequential
Miquel Raynal
miquel.raynal at bootlin.com
Mon Nov 7 06:41:01 PST 2022
miquel.raynal at bootlin.com wrote on Mon, 7 Nov 2022 14:56:10 +0100:
> Hi Jaime,
>
> Sorry for the long wait.
>
> jaimeliao.tw at gmail.com wrote on Mon, 29 Aug 2022 15:24:52 +0800:
>
> > For reducing instruction in command operation and time of read sequence
> > which address are continuous.
> > Adding cache read sequential feature support in raw nand existing architecture.
>
> What about :
>
> "
> Add support for sequential cache reads.
>
> Sequential reads may reduce the overhead when accessing physically
> continuous data by loading in cache the next page while the previous
> page gets sent out on the NAND bus.
>
> The ONFI specification provides the following additional commands to
> handle sequential cached reads:
>
> * 0x31 - READ CACHE SEQUENTIAL:
> Requires the NAND chip to load the next page into cache while keeping
> the current cache available for host reads.
> * 0x3F - READ CACHE END:
> Tells the NAND chip this is the end of the sequential cache read, the
> current cache shall remain accessible for the host but no more
> internal cache loading operation is required.
>
> On the bus, a multi page read operation is currently handled like this:
>
> 00 -- ADDR1 -- 30 -- WAIT_RDY (tR+tRR) -- DATA1_IN
> 00 -- ADDR2 -- 30 -- WAIT_RDY (tR+tRR) -- DATA2_IN
> 00 -- ADDR3 -- 30 -- WAIT_RDY (tR+tRR) -- DATA3_IN
>
> Sequential cached reads may instead be achieved with:
>
> 00 -- ADDR1 -- 30 -- WAIT_RDY (tR) -- \
> 31 -- WAIT_RDY (tRCBSY+tRR) -- DATA1_IN \
> 31 -- WAIT_RDY (tRCBSY+tRR) -- DATA2_IN \
> 3F -- WAIT_RDY (tRCBSY+tRR) -- DATA3_IN
> "
>
> > Creating nand_exec_continue_read_page_op function for read command extend.
> > Adding command select in nand_lp_exec_read_page_op and taking readlen to
> > judge the last page for read.
> > Command of read cache sequential can be separate 3 parts.
> >
> > First part is 00-ADDR-30-31-DATA_IN, it means read cache sequential enabled and next
> > page data can be prepare while cache data reading.
> >
> > Second part is 31-DATA_IN, means flash will prepare next page data for reducing
> > time cost.
> >
> > Last part is 3F-DATA_IN, means the last cache read and flash don't need to prepare
> > next page data.
> >
> > Example for 2 page read operation, as below is the NAND operation instructions:
> > Normal read mode:
> > 00 -- ADDR1 -- 30 -- WAIT_RDY(tR+tRR) -- DATA1_IN
> > 00 -- ADDR2 -- 30 -- WAIT_RDY(tR+tRR) -- DATA2_IN
> >
> > Cache read sequential mode:
> > 00 -- ADDR1 -- 30 -- WAIT_RDY(tR) -- 31 -- WAIT_RDY(tRCBSY+tRR) -- DATA1_IN
> > 3F -- WAIT_RDY(tRCBSY+tRR) -- DATA2_IN
> >
> > tR is about 20 microsecond
> > tRCBSY is about 5 microsecond
> > tRR is about 20 nanosecond
> >
> > As below is mtd_speedtest result between normale read and cache read
> > sequential on NXP i.MX6 VAR-SOM-SOLO mapping mode.
>
> Below are the read speed test results with regular reads and
> sequential cached reads, on NXP i.MX6 VAR-SOM-SOLO in mapping mode with
> a NAND chip characterized with the following timings:
> * tR is about 20µs
> * tRCBSY is about 5µs
> * tRR is about 20µs
>
> >
> > ===============Normal read 33M===================
> > mtd_speedtest: MTD device: 0
> > mtd_speedtest: MTD device size 2097152, eraseblock size 131072, page size 2048,
> > count of eraseblocks 16, pages per eraseblock 64, OOB size 64
> > mtd_test: scanning for bad eraseblocks
> > mtd_test: scanned 16 eraseblocks, 0 are bad
> > mtd_speedtest: testing eraseblock write speed
> > mtd_speedtest: eraseblock write speed is 3864 KiB/s
> > mtd_speedtest: testing eraseblock read speed
> > mtd_speedtest: eraseblock read speed is 15633 KiB/s
>
> Maybe in both cases you can just keep the interesting lines with the
> various read speeds.
>
> > mtd_speedtest: testing page write speed
> > mtd_speedtest: page write speed is 3561 KiB/s
> > mtd_speedtest: testing page read speed
> > mtd_speedtest: page read speed is 15515 KiB/s
> > mtd_speedtest: testing 2 page write speed
> > mtd_speedtest: 2 page write speed is 3690 KiB/s
> > mtd_speedtest: testing 2 page read speed
> > mtd_speedtest: 2 page read speed is 15398 KiB/s
> > mtd_speedtest: Testing erase speed
> > mtd_speedtest: erase speed is 227555 KiB/s
> > mtd_speedtest: Testing 2x multi-block erase speed
> > mtd_speedtest: 2x multi-block erase speed is 204800 KiB/s
> > mtd_speedtest: Testing 4x multi-block erase speed
> > mtd_speedtest: 4x multi-block erase speed is 204800 KiB/s
> > mtd_speedtest: Testing 8x multi-block erase speed
> > mtd_speedtest: 8x multi-block erase speed is 204800 KiB/s
> > mtd_speedtest: Testing 16x multi-block erase speed
> > mtd_speedtest: 16x multi-block erase speed is 204800 KiB/s
> > mtd_speedtest: Testing 32x multi-block erase speed
> > mtd_speedtest: 32x multi-block erase speed is 204800 KiB/s
> > mtd_speedtest: Testing 64x multi-block erase speed
> > mtd_speedtest: 64x multi-block erase speed is 227555 KiB/s
> > mtd_speedtest: finished
> > =================================================
> >
> > ============Cache reade Sequential 33MHz===========
> > mtd_speedtest: MTD device: 0
> > mtd_speedtest: MTD device size 2097152, eraseblock size 131072, page size 2048,
> > count of eraseblocks 16, pages per eraseblock 64, OOB size 64
> > mtd_test: scanning for bad eraseblocks
> > mtd_test: scanned 16 eraseblocks, 0 are bad
> > mtd_speedtest: testing eraseblock write speed
> > mtd_speedtest: eraseblock write speed is 4120 KiB/s
> > mtd_speedtest: testing eraseblock read speed
> > mtd_speedtest: eraseblock read speed is 18285 KiB/s
> > mtd_speedtest: testing page write speed
> > mtd_speedtest: page write speed is 4031 KiB/s
> > mtd_speedtest: testing page read speed
> > mtd_speedtest: page read speed is 15875 KiB/s
> > mtd_speedtest: testing 2 page write speed
> > mtd_speedtest: 2 page write speed is 3946 KiB/s
> > mtd_speedtest: testing 2 page read speed
> > mtd_speedtest: 2 page read speed is 16253 KiB/s
> > mtd_speedtest: Testing erase speed
> > mtd_speedtest: erase speed is 227555 KiB/s
> > mtd_speedtest: Testing 2x multi-block erase speed
> > mtd_speedtest: 2x multi-block erase speed is 227555 KiB/s
> > mtd_speedtest: Testing 4x multi-block erase speed
> > mtd_speedtest: 4x multi-block erase speed is 204800 KiB/s
> > mtd_speedtest: Testing 8x multi-block erase speed
> > mtd_speedtest: 8x multi-block erase speed is 204800 KiB/s
> > mtd_speedtest: Testing 16x multi-block erase speed
> > mtd_speedtest: 16x multi-block erase speed is 227555 KiB/s
> > mtd_speedtest: Testing 32x multi-block erase speed
> > mtd_speedtest: 32x multi-block erase speed is 204800 KiB/s
> > mtd_speedtest: Testing 64x multi-block erase speed
> > mtd_speedtest: 64x multi-block erase speed is 204800 KiB/s
> > mtd_speedtest: finished
> > =================================================
> >
> > The result could have about 5% improvement between 2 modes when 2 pages read.
>
> We observe an overall improvement of x% when reading 2 pages, y% when
> reading a block.
>
> > Signed-off-by: JaimeLiao <jaimeliao.tw at gmail.com>
> > ---
> > drivers/mtd/nand/raw/nand_base.c | 96 +++++++++++++++++++++++++++--
> > drivers/mtd/nand/raw/nand_timings.c | 12 ++++
> > include/linux/mtd/rawnand.h | 7 +++
> > 3 files changed, 110 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
> > index 3d6c6e880520..7400816f0627 100644
> > --- a/drivers/mtd/nand/raw/nand_base.c
> > +++ b/drivers/mtd/nand/raw/nand_base.c
> > @@ -1176,6 +1176,39 @@ static int nand_lp_exec_read_page_op(struct nand_chip *chip, unsigned int page,
> > const struct nand_interface_config *conf =
> > nand_get_interface_config(chip);
> > u8 addrs[5];
> > +
> > + /* Command sequence select to 00-30-31 when read cache sequential is enabled */
> > + if (chip->ops.conti_read_require) {
> > + struct nand_op_instr instrs[] = {
> > + NAND_OP_CMD(NAND_CMD_READ0, 0),
> > + NAND_OP_ADDR(4, addrs, 0),
> > + NAND_OP_CMD(NAND_CMD_READSTART, NAND_COMMON_TIMING_NS(conf, tWB_max)),
> > + NAND_OP_WAIT_RDY(NAND_COMMON_TIMING_MS(conf, tR_max), 0),
> > + NAND_OP_CMD(NAND_CMD_READCONTI, NAND_COMMON_TIMING_NS(conf, tWB_max)),
> > + NAND_OP_WAIT_RDY(NAND_COMMON_TIMING_MS(conf, tRCBSY_min),
> > + NAND_COMMON_TIMING_NS(conf, tRR_min)),
> > + NAND_OP_DATA_IN(len, buf, 0),
> > + };
> > + struct nand_operation op = NAND_OPERATION(chip->cur_cs, instrs);
> > + int ret;
> > +
> > + if (!len)
> > + op.ninstrs--;
> > +
> > + ret = nand_fill_column_cycles(chip, addrs, offset_in_page);
> > + if (ret < 0)
> > + return ret;
> > +
> > + addrs[2] = page;
> > + addrs[3] = page >> 8;
> > +
> > + if (chip->options & NAND_ROW_ADDR_3) {
> > + addrs[4] = page >> 16;
> > + instrs[1].ctx.addr.naddrs++;
> > + }
> > +
> > + return nand_exec_op(chip, &op);
> > + }
>
> Please create a dedicated function for continuous reads, let's avoid
> messing with the regular path.
>
> I would greatly appreciate additional parameters
>
> > struct nand_op_instr instrs[] = {
> > NAND_OP_CMD(NAND_CMD_READ0, 0),
> > NAND_OP_ADDR(4, addrs, 0),
> > @@ -1206,6 +1239,32 @@ static int nand_lp_exec_read_page_op(struct nand_chip *chip, unsigned int page,
> > return nand_exec_op(chip, &op);
> > }
> >
> > +static int nand_exec_continue_read_page_op(struct nand_chip *chip,
> > + void *buf, unsigned int len)
> > +{
> > + struct mtd_info *mtd = nand_to_mtd(chip);
> > + bool conti_read_last = false;
> > + const struct nand_interface_config *conf =
> > + nand_get_interface_config(chip);
> > + /* Set conti_read_last true before the last aligned page reading */
> > + if (chip->ops.conti_readlen >= mtd->writesize && chip->ops.conti_readlen < 2*mtd->writesize)
> > + conti_read_last = true;
> > + struct nand_op_instr instrs[] = {
> > + NAND_OP_CMD(conti_read_last?NAND_CMD_READLAST:NAND_CMD_READCONTI,
> > + NAND_COMMON_TIMING_NS(conf, tWB_max)),
>
> Please run checkpatch.pl and fix the style.
>
> > + NAND_OP_WAIT_RDY(NAND_COMMON_TIMING_MS(conf, tRCBSY_min),
> > + NAND_COMMON_TIMING_NS(conf, tRR_min)),
> > + NAND_OP_DATA_IN(len, buf, 0),
> > + };
> > + struct nand_operation op = NAND_OPERATION(chip->cur_cs, instrs);
> > +
> > + /* Drop the DATA_IN instruction if len is set to 0. */
> > + if (!len)
> > + op.ninstrs--;
> > +
> > + return nand_exec_op(chip, &op);
> > +}
> > +
> > /**
> > * nand_read_page_op - Do a READ PAGE operation
> > * @chip: The NAND chip
> > @@ -1231,10 +1290,15 @@ int nand_read_page_op(struct nand_chip *chip, unsigned int page,
> > return -EINVAL;
> >
> > if (nand_has_exec_op(chip)) {
> > - if (mtd->writesize > 512)
> > - return nand_lp_exec_read_page_op(chip, page,
> > - offset_in_page, buf,
> > - len);
> > + if (mtd->writesize > 512) {
> > + /* Selecting read command sequence */
> > + if (!chip->ops.conti_read_first_page && chip->ops.conti_read_require)
> > + return nand_exec_continue_read_page_op(chip, buf, len);
> > + else
> > + return nand_lp_exec_read_page_op(chip, page,
> > + offset_in_page, buf,
> > + len);
> > + }
>
> EZ NAND do not support sequential cache reads, please add the necessary
> flags to avoid trying continuous reads on them.
Please also run exec_op with check_only set to verify the controller
supports it before enabling the flag.
> Also, as said above, I would create an entirely new helper like this:
>
> nand_lp_exec_cont_read_page_op(chip, page, buf, len)
> {
> if (chip->cont_read_first_page == page)
> start_continuous_read();
> else if (chip->cont_read_last_page == page)
> end_continuous_read();
> else
> continuous_read();
> }
>
> >
> > return nand_sp_exec_read_page_op(chip, page, offset_in_page,
> > buf, len);
> > @@ -3362,7 +3426,6 @@ static int nand_do_read_ops(struct nand_chip *chip, loff_t from,
> >
> > bytes = min(mtd->writesize - col, readlen);
> > aligned = (bytes == mtd->writesize);
> > -
>
> Unrelated change, please drop
>
> > if (!aligned)
> > use_bounce_buf = 1;
> > else if (chip->options & NAND_USES_DMA)
> > @@ -3381,6 +3444,25 @@ static int nand_do_read_ops(struct nand_chip *chip, loff_t from,
> > __func__, buf);
> >
> > read_retry:
>
> I believe read retries cannot be used with sequential cache reads,
> because the whole point of retries is to read from the NAND array into
> cache again, which is exactly what we are trying to avoid. Am I right?
> If yes, then we should disable continuous reads when a retry is
> required.
>
> > + /*
> > + * Read cache sequential command can be separate 3 parts.
> > + * For distinguishing that flash driver set value to
> > + * conti_read_first_page for checking whether the first page
> > + * and set conti_read_require to select command sequence.
> > + * And take readlen to conti_readlen for checking whether the
> > + * last page or not.
>
> Maybe you could use the reworded commit log I proposed to explain the
> different cases.
>
> > + */
> > + if (!col && readlen >= (2*mtd->writesize)) {
> > + if (chip->ops.conti_read_require == 0)
> > + chip->ops.conti_read_first_page = 1;
> > + else
> > + chip->ops.conti_read_first_page = 0;
> > + chip->ops.conti_read_require = 1;
> > + chip->ops.conti_readlen = readlen;
> > + } else if (chip->ops.conti_read_require) {
> > + chip->ops.conti_read_first_page = 0;
> > + chip->ops.conti_readlen = readlen;
> > + }
>
> I would create a helper for that.
>
> It should look at the entire operation and just set a number of
> variables stored into struct nand_chip (even though I'm not a big fan
> of the idea, I don't see how to do it differently). The feature should
> be disabled if they cannot be fully used, let's support the simplest
> cases only. I would like to see a single parameter which tells the core:
> use or do not use continuous read on this operation. Then, I would
> prefer storing eg. the address of the first and last pages to read so
> that a dumb read helper (like above) can just look at the page address
> and perform the right call.
>
> > /*
> > * Now read the page into the buffer. Absent an error,
> > * the read methods return max bitflips per ecc step.
> > @@ -3468,6 +3550,10 @@ static int nand_do_read_ops(struct nand_chip *chip, loff_t from,
> > retry_mode = 0;
> > }
> >
> > + /* set conti_read_require 0 when finish the last aligned page read */
> > + if (readlen < mtd->writesize)
> > + chip->ops.conti_read_require = 0;
> > +
> > if (!readlen)
> > break;
> >
> > diff --git a/drivers/mtd/nand/raw/nand_timings.c b/drivers/mtd/nand/raw/nand_timings.c
> > index 7b41afc372d2..cec5d9e526fe 100644
> > --- a/drivers/mtd/nand/raw/nand_timings.c
> > +++ b/drivers/mtd/nand/raw/nand_timings.c
>
> Maybe the timing additions can be done in a separated patch.
>
> > @@ -27,6 +27,7 @@ static const struct nand_interface_config onfi_sdr_timings[] = {
> > .timings.sdr = {
> > .tCCS_min = 500000,
> > .tR_max = 200000000,
> > + .tRCBSY_min = 40000000,
> > .tPROG_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tBERS_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tADL_min = 400000,
> > @@ -72,6 +73,7 @@ static const struct nand_interface_config onfi_sdr_timings[] = {
> > .timings.sdr = {
> > .tCCS_min = 500000,
> > .tR_max = 200000000,
> > + .tRCBSY_min = 40000000,
> > .tPROG_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tBERS_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tADL_min = 400000,
> > @@ -117,6 +119,7 @@ static const struct nand_interface_config onfi_sdr_timings[] = {
> > .timings.sdr = {
> > .tCCS_min = 500000,
> > .tR_max = 200000000,
> > + .tRCBSY_min = 40000000,
> > .tPROG_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tBERS_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tADL_min = 400000,
> > @@ -162,6 +165,7 @@ static const struct nand_interface_config onfi_sdr_timings[] = {
> > .timings.sdr = {
> > .tCCS_min = 500000,
> > .tR_max = 200000000,
> > + .tRCBSY_min = 40000000,
> > .tPROG_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tBERS_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tADL_min = 400000,
> > @@ -207,6 +211,7 @@ static const struct nand_interface_config onfi_sdr_timings[] = {
> > .timings.sdr = {
> > .tCCS_min = 500000,
> > .tR_max = 200000000,
> > + .tRCBSY_min = 40000000,
> > .tPROG_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tBERS_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tADL_min = 400000,
> > @@ -252,6 +257,7 @@ static const struct nand_interface_config onfi_sdr_timings[] = {
> > .timings.sdr = {
> > .tCCS_min = 500000,
> > .tR_max = 200000000,
> > + .tRCBSY_min = 40000000,
> > .tPROG_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tBERS_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tADL_min = 400000,
> > @@ -300,6 +306,7 @@ static const struct nand_interface_config onfi_nvddr_timings[] = {
> > .timings.nvddr = {
> > .tCCS_min = 500000,
> > .tR_max = 200000000,
> > + .tRCBSY_min = 40000000,
> > .tPROG_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tBERS_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tAC_min = 3000,
> > @@ -342,6 +349,7 @@ static const struct nand_interface_config onfi_nvddr_timings[] = {
> > .timings.nvddr = {
> > .tCCS_min = 500000,
> > .tR_max = 200000000,
> > + .tRCBSY_min = 40000000,
> > .tPROG_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tBERS_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tAC_min = 3000,
> > @@ -384,6 +392,7 @@ static const struct nand_interface_config onfi_nvddr_timings[] = {
> > .timings.nvddr = {
> > .tCCS_min = 500000,
> > .tR_max = 200000000,
> > + .tRCBSY_min = 40000000,
> > .tPROG_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tBERS_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tAC_min = 3000,
> > @@ -426,6 +435,7 @@ static const struct nand_interface_config onfi_nvddr_timings[] = {
> > .timings.nvddr = {
> > .tCCS_min = 500000,
> > .tR_max = 200000000,
> > + .tRCBSY_min = 40000000,
> > .tPROG_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tBERS_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tAC_min = 3000,
> > @@ -468,6 +478,7 @@ static const struct nand_interface_config onfi_nvddr_timings[] = {
> > .timings.nvddr = {
> > .tCCS_min = 500000,
> > .tR_max = 200000000,
> > + .tRCBSY_min = 40000000,
> > .tPROG_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tBERS_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tAC_min = 3000,
> > @@ -510,6 +521,7 @@ static const struct nand_interface_config onfi_nvddr_timings[] = {
> > .timings.nvddr = {
> > .tCCS_min = 500000,
> > .tR_max = 200000000,
> > + .tRCBSY_min = 40000000,
> > .tPROG_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tBERS_max = 1000000ULL * ONFI_DYN_TIMING_MAX,
> > .tAC_min = 3000,
> > diff --git a/include/linux/mtd/rawnand.h b/include/linux/mtd/rawnand.h
> > index b2f9dd3cbd69..49f1249ba852 100644
> > --- a/include/linux/mtd/rawnand.h
> > +++ b/include/linux/mtd/rawnand.h
> > @@ -67,6 +67,8 @@ struct gpio_desc;
> >
> > /* Extended commands for large page devices */
> > #define NAND_CMD_READSTART 0x30
> > +#define NAND_CMD_READCONTI 0x31
> > +#define NAND_CMD_READLAST 0x3f
> > #define NAND_CMD_RNDOUTSTART 0xE0
> > #define NAND_CMD_CACHEDPROG 0x15
> >
> > @@ -436,6 +438,7 @@ struct nand_sdr_timings {
> > u32 tCCS_min;
> > u64 tPROG_max;
> > u64 tR_max;
> > + u64 tRCBSY_min;
> > u32 tALH_min;
> > u32 tADL_min;
> > u32 tALS_min;
> > @@ -525,6 +528,7 @@ struct nand_nvddr_timings {
> > u32 tCCS_min;
> > u64 tPROG_max;
> > u64 tR_max;
> > + u64 tRCBSY_min;
> > u32 tAC_min;
> > u32 tAC_max;
> > u32 tADL_min;
> > @@ -1173,6 +1177,9 @@ struct nand_chip_ops {
> > int (*setup_read_retry)(struct nand_chip *chip, int retry_mode);
> > int (*choose_interface_config)(struct nand_chip *chip,
> > struct nand_interface_config *iface);
> > + bool conti_read_require;
> > + bool conti_read_first_page;
> > + int conti_readlen;
>
> I would rather prefer the cont_ prefix rather than conti_.
>
> And why do we have these information under "nand_chip_ops" rather than
> nand_chip?
>
> Thanks,
> Miquèl
Thanks,
Miquèl
More information about the linux-mtd
mailing list