[PATCH v4 0/8] mtd: spi-nor: read while write support
Tudor Ambarus
tudor.ambarus at linaro.org
Mon Mar 27 02:34:15 PDT 2023
On 3/24/23 13:51, Miquel Raynal wrote:
> Hi Tudor,
Hi!
>
> tudor.ambarus at linaro.org wrote on Fri, 17 Mar 2023 04:13:27 +0000:
>
>> On 2/1/23 11:35, Miquel Raynal wrote:
>>> Hello folks,
>>>
>>> Here is the follow-up of the RFC trying to bring a little bit of
>>> parallelism to support SPI-NOR Read While Write feature on parts
>>> supporting it and featuring several banks.
>>>
>>> I have received some hardware to make it work, so since the RFC, the
>>> series has been updated to fix my mistakes, but the overall idea is the
>>> same.
>>>
>>> There is nothing Macronix specific in the implementation, the operations
>>> and opcodes are exactly the same as before. The only difference being:
>>> we may consider the chip usable when it is in the busy state during a
>>> write or an erase. Any chip with an internal split allowing to perform
>>> parallel operations might possibly leverage the benefits of this
>>> implementation.
>>>
>>> The first patches are just refactoring and preparation work, there is
>>> almost no functional change, it's just a way to prepare the introduction
>>> of the new locking mechanism and hopefully provide the cleanest and
>>> simplest diff possible for this new feature. The actual change is all
>>> contained in "mtd: spi-nor: Enhance locking to support reads while
>>> writes". The logic is described in the commit log and copy/pasted here
>>> for clarity:
>>>
>>> "
>>> On devices featuring several banks, the Read While Write (RWW) feature
>>> is here to improve the overall performance when performing parallel
>>> reads and writes at different locations (different banks). The
>>> following constraints have to be taken into account:
>>> 1#: A single operation can be performed in a given bank.
>>> 2#: Only a single program or erase operation can happen on the entire
>>> chip (common hardware limitation to limit costs)
>>> 3#: Reads must remain serialized even though reads on different banks
>>> might occur at the same time.
>>> 4#: The I/O bus is unique and thus is the most constrained resource,
>>> all spi-nor operations requiring access to the spi bus (through
>>> the spi controller) must be serialized until the bus exchanges
>>> are over. So we must ensure a single operation can be "sent" at
>>> a time.
>>> 5#: Any other operation that would not be either a read or a write or an
>>> erase is considered requiring access to the full chip and cannot be
>>> parallelized, we then need to ensure the full chip is in the idle
>>> state when this occurs.
>>>
>>> All these constraints can easily be managed with a proper locking model:
>>> 1#: Is enforced by a bitfield of the in-use banks, so that only a single
>>> operation can happen in a specific bank at any time.
>>> 2#: Is handled by the ongoing_pe boolean which is set before any write
>>> or erase, and is released only at the very end of the
>>> operation. This way, no other destructive operation on the chip can
>>> start during this time frame.
>>> 3#: An ongoing_rd boolean allows to track the ongoing reads, so that
>>> only one can be performed at a time.
>>> 4#: An ongoing_io boolean is introduced in order to capture and
>>> serialize bus accessed. This is the one being released "sooner"
>>> than before, because we only need to protect the chip against
>>> other SPI accesses during the I/O phase, which for the
>>> destructive operations is the beginning of the operation (when
>>> we send the command cycles and possibly the data), while the
>>> second part of the operation (the erase delay or the
>>> programmation delay) is when we can do something else in another
>>> bank.
>>> 5#: Is handled by the three booleans presented above, if any of them is
>>> set, the chip is not yet ready for the operation and must wait.
>>>
>>> All these internal variables are protected by the existing lock, so that
>>> changes in this structure are atomic. The serialization is handled with
>>> a wait queue."
>>>
>>> Here is now a benchmark with a Macronix MX25UW51245G with 4 banks and RWW
>>> support:
>>>
>>> // Testing the two accesses in the same bank
>>> $ flash_speed -b0 -k0 -c10 -d /dev/mtd0
>>> [...]
>>> testing read while write latency
>>> read while write took 51ms, read ended after 51ms
>>>
>>> // Testing the two accesses within different banks
>>> $ flash_speed -b0 -k4096 -c10 -d /dev/mtd0
>>> [...]
>>> testing read while write latency
>>> read while write took 51ms, read ended after 20ms
>>>
>>> Parallel accesses have been validated with io_paral. A slight increase
>>> of the time spent on this test has however been noticed. With my
>>
>> how do the other tests look? Is there any change in performance for
>> flashes that do not support RWW?
>
> The current implementation takes care of not changing anything with the
> existing flashes, when I resend I'll provide all the logs you asked
yes, I saw. There are some ifs here and there, nothing scary, so I don't
expect any change in performance for the flashes without RWW support,
but it's always good to have a proof.
> for, plus another quick test without the RWW feature bit set.
>
Cool, thanks! Cheers,
ta
More information about the linux-arm-kernel
mailing list