[PATCH v3 0/9] mtd: spi-nor: read while write support

Miquel Raynal miquel.raynal at bootlin.com
Thu Dec 15 00:12:32 PST 2022

Hello folks,

Here is the follow-up of the RFC trying to bring a little bit of
parallelism to support SPI-NOR Read While Write feature on parts
supporting it and featuring several banks.

I have received some hardware to make it work, so since the RFC, the
series has been updated to fix my mistakes, but the overall idea is the

There is nothing Macronix specific in the implementation, the operations
and opcodes are exactly the same as before. The only difference being:
we may consider the chip usable when it is in the busy state during a
write or an erase. Any chip with an internal split allowing to perform
parallel operations might possibly leverage the benefits of this

The first patches are just refactoring and preparation work, there is
almost no functional change, it's just a way to prepare the introduction
of the new locking mechanism and hopefully provide the cleanest and
simplest diff possible for this new feature. The actual change is all
contained in "mtd: spi-nor: Enhance locking to support reads while
writes". The logic is described in the commit log and copy/pasted here
for clarity:

    On devices featuring several banks, the Read While Write (RWW) feature
    is here to improve the overall performance when performing parallel
    reads and writes at different locations (different banks). The
    following constraints have to be taken into account:
    1#: A single operation can be performed in a given bank.
    2#: Only a single program or erase operation can happen on the entire
        chip (common hardware limitation to limit costs)
    3#: Reads must remain serialized even though reads on different banks
        might occur at the same time.
    4#: The I/O bus is unique and thus is the most constrained resource, all
        spi-nor operations requiring access to the spi bus (through the spi
        controller) must be serialized until the bus exchanges are over. So
        we must ensure a single operation can be "sent" at a time.
    5#: Any other operation that would not be either a read or a write or an
        erase is considered requiring access to the full chip and cannot be
        parallelized, we then need to ensure the full chip is in the idle
        state when this occurs.
    All these constraints can easily be managed with a proper locking model:
    1#: Is enforced by a bitfield of the in-use banks, so that only a single
        operation can happen in a specific bank at any time.
    2#: Is handled by the ongoing_pe boolean which is set before any write
        or erase, and is released only at the very end of the
        operation. This way, no other destructive operation on the chip can
        start during this time frame.
    3#: An ongoing_rd boolean allows to track the ongoing reads, so that
        only one can be performed at a time.
    4#: An ongoing_io boolean is introduced in order to capture and
        serialize bus accessed. This is the one being released "sooner"
        than before, because we only need to protect the chip against
        other SPI accesses during the I/O phase, which for the
        destructive operations is the beginning of the operation (when
        we send the command cycles and possibly the data), while the
        second part of the operation (the erase delay or the
        programmation delay) is when we can do something else in another
    5#: Is handled by the three booleans presented above, if any of them is
        set, the chip is not yet ready for the operation and must wait.
    All these internal variables are protected by the existing lock, so that
    changes in this structure are atomic. The serialization is handled with
    a wait queue."

Here is now a benchmark with a Macronix MX25UW51245G with 4 banks and RWW

     // Testing the two accesses in the same bank
     $ flash_speed -b0 -k0 -c10 -d /dev/mtd0
     testing read while write latency
     read while write took 51ms, read ended after 51ms

     // Testing the two accesses within different banks
     $ flash_speed -b0 -k4096 -c10 -d /dev/mtd0
     testing read while write latency
     read while write took 51ms, read ended after 20ms

Parallel accesses have been validated with io_paral. A slight increase
of the time spent on this test has however been noticed. With my
configuration, over a limited number of blocks, the overall operation
took 22 min without any RWW changes up to 27 min with these changes,
maybe due to the number of additional scheduling situations involved).

Here is a branch with the mtd-utils patch bringing support for this
additional "-k" parameter in flash_speed (for the second block to use
during RWW testing), used to get the above results:


Changes in v3:
* Fix the bank offsets calculations by providing the same values when
  locking and when unlocking (might be changed by the functions themselves
  without use noticing).
* I completely changed the way the locking works because there was a new
  constraint: reads cannot be interrupted and status reads cannot happen
  during a read. Hence, as the multi-locks design was starting to be too
  messy, I changed the implementation to use a bunch of variables to
  track the read while write state, protected by the main spi-nor
  lock. If the internal state does not allow the operation, a sleep
  starts in a queue, until the threads are woken up after a state
  update. I know it is very verbose, I am open to suggestions.

Miquel Raynal (9):
  mtd: spi-nor: Create macros to define chip IDs and geometries
  mtd: spi-nor: Introduce the concept of bank
  mtd: spi-nor: Add a macro to define more banks
  mtd: spi-nor: Reorder the preparation vs locking steps
  mtd: spi-nor: Separate preparation and locking
  mtd: spi-nor: Prepare the introduction of a new locking mechanism
  mtd: spi-nor: Add a RWW flag
  mtd: spi-nor: Enhance locking to support reads while writes
  mtd: spi-nor: macronix: Add support for mx25uw51245g with RWW

 drivers/mtd/spi-nor/core.c     | 396 +++++++++++++++++++++++++++++++--
 drivers/mtd/spi-nor/core.h     |  61 ++---
 drivers/mtd/spi-nor/macronix.c |   3 +
 include/linux/mtd/spi-nor.h    |  13 ++
 4 files changed, 424 insertions(+), 49 deletions(-)


