[PATCH v4 0/8] mtd: spi-nor: read while write support

Mon Mar 27 02:34:15 PDT 2023

On 3/24/23 13:51, Miquel Raynal wrote:
> Hi Tudor,

Hi!

> 
> tudor.ambarus at linaro.org wrote on Fri, 17 Mar 2023 04:13:27 +0000:
> 
>> On 2/1/23 11:35, Miquel Raynal wrote:
>>> Hello folks,
>>>
>>> Here is the follow-up of the RFC trying to bring a little bit of
>>> parallelism to support SPI-NOR Read While Write feature on parts
>>> supporting it and featuring several banks.
>>>
>>> I have received some hardware to make it work, so since the RFC, the
>>> series has been updated to fix my mistakes, but the overall idea is the
>>> same.
>>>
>>> There is nothing Macronix specific in the implementation, the operations
>>> and opcodes are exactly the same as before. The only difference being:
>>> we may consider the chip usable when it is in the busy state during a
>>> write or an erase. Any chip with an internal split allowing to perform
>>> parallel operations might possibly leverage the benefits of this
>>> implementation.
>>>
>>> The first patches are just refactoring and preparation work, there is
>>> almost no functional change, it's just a way to prepare the introduction
>>> of the new locking mechanism and hopefully provide the cleanest and
>>> simplest diff possible for this new feature. The actual change is all
>>> contained in "mtd: spi-nor: Enhance locking to support reads while
>>> writes". The logic is described in the commit log and copy/pasted here
>>> for clarity:
>>>
>>> "
>>>     On devices featuring several banks, the Read While Write (RWW) feature
>>>     is here to improve the overall performance when performing parallel
>>>     reads and writes at different locations (different banks). The
>>>     following constraints have to be taken into account:
>>>     1#: A single operation can be performed in a given bank.
>>>     2#: Only a single program or erase operation can happen on the entire
>>>         chip (common hardware limitation to limit costs)
>>>     3#: Reads must remain serialized even though reads on different banks
>>>         might occur at the same time.
>>>     4#: The I/O bus is unique and thus is the most constrained resource,
>>>         all spi-nor operations requiring access to the spi bus (through
>>>         the spi controller) must be serialized until the bus exchanges
>>>         are over. So we must ensure a single operation can be "sent" at
>>>         a time.
>>>     5#: Any other operation that would not be either a read or a write or an
>>>         erase is considered requiring access to the full chip and cannot be
>>>         parallelized, we then need to ensure the full chip is in the idle
>>>         state when this occurs.
>>>     
>>>     All these constraints can easily be managed with a proper locking model:
>>>     1#: Is enforced by a bitfield of the in-use banks, so that only a single
>>>         operation can happen in a specific bank at any time.
>>>     2#: Is handled by the ongoing_pe boolean which is set before any write
>>>         or erase, and is released only at the very end of the
>>>         operation. This way, no other destructive operation on the chip can
>>>         start during this time frame.
>>>     3#: An ongoing_rd boolean allows to track the ongoing reads, so that
>>>         only one can be performed at a time.
>>>     4#: An ongoing_io boolean is introduced in order to capture and
>>>         serialize bus accessed. This is the one being released "sooner"
>>>         than before, because we only need to protect the chip against
>>>         other SPI accesses during the I/O phase, which for the
>>>         destructive operations is the beginning of the operation (when
>>>         we send the command cycles and possibly the data), while the
>>>         second part of the operation (the erase delay or the
>>>         programmation delay) is when we can do something else in another
>>>         bank.
>>>     5#: Is handled by the three booleans presented above, if any of them is
>>>         set, the chip is not yet ready for the operation and must wait.
>>>     
>>>     All these internal variables are protected by the existing lock, so that
>>>     changes in this structure are atomic. The serialization is handled with
>>>     a wait queue."
>>>
>>> Here is now a benchmark with a Macronix MX25UW51245G with 4 banks and RWW
>>> support:
>>>
>>>      // Testing the two accesses in the same bank
>>>      $ flash_speed -b0 -k0 -c10 -d /dev/mtd0
>>>      [...]
>>>      testing read while write latency
>>>      read while write took 51ms, read ended after 51ms
>>>
>>>      // Testing the two accesses within different banks
>>>      $ flash_speed -b0 -k4096 -c10 -d /dev/mtd0
>>>      [...]
>>>      testing read while write latency
>>>      read while write took 51ms, read ended after 20ms
>>>
>>> Parallel accesses have been validated with io_paral. A slight increase
>>> of the time spent on this test has however been noticed. With my  
>>
>> how do the other tests look? Is there any change in performance for
>> flashes that do not support RWW?
> 
> The current implementation takes care of not changing anything with the
> existing flashes, when I resend I'll provide all the logs you asked

yes, I saw. There are some ifs here and there, nothing scary, so I don't
expect any change in performance for the flashes without RWW support,
but it's always good to have a proof.
> for, plus another quick test without the RWW feature bit set.
> 

Cool, thanks! Cheers,
ta