[PATCH 7/9] mtd: m25p80: add support of dual and quad spi protocols to all commands

Tue Jun 28 00:57:19 PDT 2016

Hello,

> -----Original Message-----
> From: Cyrille Pitchen [mailto:cyrille.pitchen at atmel.com]
> Sent: Monday, June 27, 2016 2:38 PM
> To: Michal Suchanek <hramrach at gmail.com>
> Cc: Brian Norris <computersforpeace at gmail.com>; MTD Maling List <linux-
> mtd at lists.infradead.org>; Marek Vašut <marex at denx.de>; Boris Brezillon
> <boris.brezillon at free-electrons.com>; nicolas.ferre at atmel.com; Linux
> Kernel Mailing List <linux-kernel at vger.kernel.org>; Krzeminski, Marcin
> (Nokia - PL/Wroclaw) <marcin.krzeminski at nokia.com>
> Subject: Re: [PATCH 7/9] mtd: m25p80: add support of dual and quad spi
> protocols to all commands
> 
> Hi all,
> 
> + Marcin,
> 
> According to Marcin's series of patches
> "mtd: spi-nor: DUAL/QUAD I/O read modes implementation"
> and the Spansion/Cypress S25FS512S datasheet, new Cypress QSPI
> memories no longer supports Fast Read 1-1-2 or Fast Read 1-1-4 but only Fast
> Read x-2-2 and Fast Read x-4-4.
> 
> Hence one more example of the need to extend the spi-nor framework to
> support QSPI memories correctly.
> 
> Also, looking at the datasheets of s25fL512s vs s25fS512s, the default factory
> settings for the number of (mode + wait state) cycles has changed for Fast
> Read 1-4-4 (EBh /ECh).
> 
> default 	mode 	wait state
> L512s: 		2	4
> S512s:		2	8
> 
> Those factory default settings are provided by the Basic Flash Parameter
> Table from the Serial Flash Discoverable Parameters (JESD216). SFDP tables
> seem to be supported by both families.
> 
Hello,

It is also possible to read number of dummy cycles (wait states) from the flash itself.
On this Spansion and Microns(eg. N25Q512) user can change and program
dummy cycles count. If we will use default (or hardcoded values like in spi-nor),
we will fail fast reads in this case. I will probably implement this in spi-nor for Spansion,
since I have use case with low SPI frequency, and dummy cycles count  could be 0.

> Hence this series of patches actually adds support to the new
> Spansion/Cypress QSPI memories.
> 
> We will just have to care about setting a proper value in the mode cycles to
> avoid entering the continuous read mode by mistake.
> Looking at the JESD216 specification the 0xff value is likely to be a common
> value for all manufacturers to tell the QSPI memory not to enter (or leave) its
> continuous read/performance enhancement mode.
> 
> Entering the continuous read mode tells the QSPI memory that the op code
> value will be implicit in the following command, which then will start from the
> address cycles. The continuous mode is mainly (only ?) use for XIP
> application.
> 
> Adding support to the continuous read mode might be interesting but should
> be discussed in a dedicated thread.
> 
> Best regards,
> 
> Cyrille
There is one more think in fs512 and fl512 family. Those devices can have 8x4kB
sectors  at the beginning or end of the flash. Those sectors can not be erased with
erase sector cmd(0xd8), but need to be erased with erase_4k(0x20).
To allow erasing those sector mtd eraseregions are needed. According to fs512s
datasheet conforming to JESD216 allows to read flash map, so it could
be possible to create regions automatically. 

Best regards,
Marcin
> 
> Le 27/06/2016 11:52, Cyrille Pitchen a écrit :
> > Le 23/06/2016 22:35, Michal Suchanek a écrit :
> >> Hello,
> >>
> >> this patch is kind of awesome.
> >>
> >> I have a few practical concerns however.
> >>
> >> On 20 June 2016 at 18:50, Cyrille Pitchen <cyrille.pitchen at atmel.com>
> wrote:
> >>> Before this patch, m25p80_read() supported few SPI protocols:
> >>> - regular SPI 1-1-1
> >>> - SPI Dual Output 1-1-2
> >>> - SPI Quad Output 1-1-4
> >>> On the other hand, all other m25p80_*() hooks only supported SPI 1-1-1.
> >>
> >> Under typical use my estimate is that huge majority of data is
> >> transferred in _read() seconded by _write().
> >>
> >> As I understand it the n-n-n means how many bits you transfer in
> >> parallel when sending command-address-data.
> >>
> >> In _read() the command and data overhead is negligible when you can
> >> read kilobytes at once. So difference between 1-1-4 and 4-4-4 is not
> >> meaningful performance-wise. Are there flash chips that support one
> >> but not the other?
> >>
> >> For _write() the benefits are even harder to assess. You can
> >> presumably write at n-n-4 or n-n-2 if your controller and flash
> >> supports it transferring the page faster. And then spend possibly
> >> large amount of time waiting for the flash to get ready again. If the
> >> programming time is fixed transferring the page faster may or may not
> >> have benefits. It may at least free the bus for other devices to use.
> >>
> >
> > This series of patches is not supposed to improve performances. Lets
> > take a Macronix mx25l25673g as an example with a 512-byte read (3-byte
> address):
> >
> > 1 - Fast Read 1-1-4 (0 mode cycle, 8 wait states)
> > command: 8 cycles
> > address/mode/dummy: (3 * 8) + 0 + 8 = 32 cycles
> > data: (512 * 8) / 4 = 1024 cycles
> > total: 8 + 32 + 1024 = 1064 cycles
> >
> > 2 - Fast Read 1-4-4 (2 mode cycles, 4 wait states)
> > command: 8 cycles
> > address/mode/dummy: (3 * 8) / 4 + 2 + 4 = 12 cycles
> > data: (512 * 8) / 4 = 1024 cycles
> > total: 8 + 12 + 1024 = 1044 cycles
> >
> > theoretical benefit: < 2%
> >
> >
> > The real purpose of the whole series is to provide solutions to some
> > known issues.
> >
> > A - Currently the spi-nor force use the B7h op code to enter the
> > 4-byte address mode of > 16MB memories. This mode is state-full; it
> > changes the internal state of the SPI memory. Once in its 4-byte mode,
> > the regular 3-byte Fast Read, Page Program and Sector Erase op codes
> > are expected to be followed by a 4-byte address. Then if a spurious
> > reset occurs, some bootloaders might be stuck as they still expect to use 3-
> byte addresses.
> >
> > On the other hand, when available, the 4-byte address instruction set
> > is stateless. So we'd rather use it whenever possible. It's the
> > purpose of patch 2. However adding the SPI_NOR_4B_OPCODES on some
> > memory entry is not always possible. Again, let's take the Macronix
> mx25l25673g memory as an example:
> > it shares the very same JEDEC ID with the older mx25l25635e. The 73g
> > supports the 4-byte address instruction set whereas the 35e doesn't.
> > Hence we can't add the SPI_NOR_4B_OPCODES on the 35e entry without
> introducing a bug.
> >
> > My first approach was to patch the spi-nor framework to allow two or
> > more memory entries to share the same JEDEC ID and to select the right
> > entry according to the compatible DT string. This solution was rejected.
> >
> > Then my latest solution is to rely on the SFDP tables: the 73g
> > supports the optional 4-byte Address Instruction Table whereas the 35e
> > doesn't support SFDP table at all. Hence, when this table is
> > successfully parsed on the 73g, we know we can safely use the 4-byte
> > op codes. For the 35e case, we still enter the 4-byte address mode as
> before since there is no other solution...
> > So with this series of patches, no regression with 35e memories but a
> > solution to properly use 73g memories without changing their internal
> > state: the bootloader is now happy.
> >
> >
> > B - Setting the old 'mode' argument of spi_nor_read() to SPI_NOR_QUAD
> > just told the SPI controller supports Fast Read 1-1-4 but nothing
> > about whether it also supports Page Program x-y-4 commands. It is
> > interesting to make the difference between read and write hardware
> > capabilities. Indeed we cannot assume that if a controller can do Fast
> > Read 1-1-4 operations then it can also perform Page Program 1-1-4. I'm
> > pretty sure this statement is false. So let's the SPI controller explicitly
> declares its read and write hardware capabilities.
> >
> > Once again, I'm not targeting performance improvement with Page
> Program.
> > However, using other SPI protocols when available helps to deal with
> > some memory quirks.
> >
> > This time, let's take the case of Micron n25q512*. Those memory are >
> > 16MB so we fall into the same issue as described in A.
> > For those Micron memories, depending on the part number (telling us
> > whether the Reset pin is available...) the 12h op code stands for two
> different operations:
> > - either 3-byte address Page Program x-4-4 (the standard 38h op code for
> this
> >   operation is not available and there is no op code for 4-byte Page Program
> >   1-1-1)
> > - either 4-byte address Page Program 1-1-1 (the standard command
> associated to
> >   the 12h op code)
> >
> > Since they are different part numbers of the same memory family, all
> > those memories once again share the very same JEDEC ID and there no
> > mean to dynamically discover the actual part number (or I didn't find such a
> mean yet).
> > Anyway, even knowing the part number, depending on the result, the
> > 4-byte Page Program 1-1-1 operation is simply not supported.
> >
> > Hopefully, for all part numbers, 4-byte Page Program 1-1-4 is always
> > supported by the 34h op code, which is the standard hope code for this
> operation.
> >
> > Then with SPI controllers which explicitly supports Page Program
> > 1-1-4, we could implement something to fix the Page Program operation
> > on Micron memory above 16MB. Hence we have a solution!
> >
> > There are just examples, I guess we could find others. My point is
> > that currently we can't use some QPSI memories properly and its a blocking
> issue.
> > The whole series should be see as a bug fixes enabler rather than a
> > performance improvement.
> >
> >
> >> The _reg_ stuff is probably negligible altogether,
> >>
> >> Lastly the faster transfers of address bytes seem to be achieved with
> >> increasingly longer command codes given how much the maximum
> command
> >> length increased. So even in a page write where the address is a few
> >> % of the transfer the benefit of these extra modes is dubious.
> > I'm not sure to understand this point but I guess you refer to:
> > -#define	MAX_CMD_SIZE		6
> > +#define	MAX_CMD_SIZE		16
> >
> > If so, the actual command size doesn't change at all, the increase of
> > this macro value is justified by another reason. Indeed, let's have a
> > look into the m25p80_read_reg(): before the patch it was implemented
> > using spi_write_then_read(). This later function uses an intermediate
> > buffer, which might be kmalloc() allocated, to transfer data with
> > spi_sync() and calls
> > memcpy() to copy data from/to this imtermediate buffer to/from buffers
> > provided as function parameters (txbuf and rxbuf).
> > As the comment says, the purpose of this intermediate buffer is to be
> > "DMA-safe".
> >
> > Then, after the patch, spi_write_then_read() is no longer used but we
> > still need a DMA-safe buffer to perform the data transfer with
> > spi_sync(). This can be achieve with the flash->command[] buffer; we
> > just need to increase its size some more since it is now also filled
> > with the read data whereas it was only filled with the
> command/address/dummy data before.
> >
> >>
> >> Overall I wonder how much it is worthwhile to complicate the code to
> >> get all these modes in every single function.
> >>
> >> Thanks
> >>
> >> Michal
> >>
> >
> > Best regards,
> >
> > Cyrille
> >