[PATCH v2 1/3] mtd: introduce the mtd_pairing_scheme concept

Mon Aug 8 15:42:18 PDT 2016

Hi Brian,

On Thu, 4 Aug 2016 12:37:51 +0800
Brian Norris <computersforpeace at gmail.com> wrote:

> Hi Boris,
> 
> On Mon, Jun 20, 2016 at 03:50:16PM +0200, Boris Brezillon wrote:
> > MLC and TLC NAND devices are using NAND cells exposing more than one bit,
> > but instead of attaching all the bits in a given cell to a single NAND
> > page, each bit is usually attached to a different page. This concept is
> > called 'page pairing', and has significant impacts on the flash storage
> > usage.
> > The main problem showed by these devices is that interrupting a page
> > program operation may not only corrupt the page we are programming
> > but also the page it is paired with, hence the need to expose to MTD
> > users the pairing scheme information.
> > 
> > The pairing APIs allows one to query pairing information attached to a
> > given page (here called wunit), or the other way around (the wunit
> > pointed by pairing information).
> > It also provides several helpers to help the conversion between absolute
> > offsets and wunits, and query the number of pairing groups.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon at free-electrons.com>  
> 
> Overall, the comments and documentation are a lot better on this one.
> Thanks for doing that! I only have a few more small comments, and with
> those, I think it's ready to land IMO. I'll try to review the NAND
> implementation bits too (look OK for now), but I'm not as worried about
> that, if we agree on the high-level API.
> 
> BTW, I don't know if we're likely to hit any conflicts on the
> mtdcore and mtd.h bits. Perhaps it will make sense for us to apply this
> first patch as a mini-branch to both our trees? Maybe if you just fixup
> any last comments, you can send me a trivial pull request / tag /
> whatever (doesn't need to be formal), with just this patch.

Sure.

> 
> > ---
> >  drivers/mtd/mtdcore.c   |  94 ++++++++++++++++++++++++++++++++++++++++++
> >  drivers/mtd/mtdpart.c   |   1 +
> >  include/linux/mtd/mtd.h | 106 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 201 insertions(+)
> > 
> > diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
> > index e3936b847c6b..decceb9fdf32 100644
> > --- a/drivers/mtd/mtdcore.c
> > +++ b/drivers/mtd/mtdcore.c
> > @@ -376,6 +376,100 @@ static int mtd_reboot_notifier(struct notifier_block *n, unsigned long state,
> >  }
> >  
> >  /**
> > + * mtd_wunit_to_pairing_info - get pairing information of a wunit
> > + * @mtd: pointer to new MTD device info structure
> > + * @wunit: write unit we are interrested in  
> 
> s/interrested/interested/
> 
> > + * @info: pairing information struct  
> 
> Maybe something to indicate this is the return value? e.g., "returned
> pairing information"?

I'll change the description.

> 
> > + *
> > + * Retrieve pairing information associated to the wunit.
> > + * This is mainly useful when dealing with MLC/TLC NANDs where pages can be
> > + * paired together, and where programming a page may influence the page it is
> > + * paired with.
> > + * The notion of page is replaced by the term wunit (write-unit) to stay
> > + * consistent with the ->writesize field.
> > + *
> > + * The @wunit argument can be extracted from an absolute offset using
> > + * mtd_offset_to_wunit(). @info is filled with the pairing information attached
> > + * to @wunit.
> > + *
> > + * From the pairing info the MTD user can find all the wunits paired with
> > + * @wunit using the following loop:
> > + *
> > + * for (i = 0; i < mtd_pairing_groups(mtd); i++) {
> > + *	info.pair = i;
> > + *	mtd_pairing_info_to_wunit(mtd, &info);
> > + *	...
> > + * }
> > + */
> > +void mtd_wunit_to_pairing_info(struct mtd_info *mtd, int wunit,
> > +			       struct mtd_pairing_info *info)
> > +{  
> 
> Do we want to do any range-checking here? i.e., make this return int? Or
> is that too paranoid? We've done similarly on most of the rest of the
> MTD API.

I'm fine changing the prototype to return an int (with -ERANGE if the
wunit parameter is exceeding the number of write-units per eraseblock).
As you say later in your review, we'd better be consistent on the
->get_info()/->get_wunit() semantic.

> 
> Notably, I think we're probably safe keeping the ->pairing->get_info()
> callback as returning void, since the driver can expect this core helper
> to do the range checking for us.
> 
> > +	if (!mtd->pairing || !mtd->pairing->get_info) {
> > +		info->group = 0;
> > +		info->pair = wunit;
> > +	} else {
> > +		mtd->pairing->get_info(mtd, wunit, info);
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(mtd_wunit_to_pairing_info);
> > +
> > +/**
> > + * mtd_wunit_to_pairing_info - get wunit from pairing information
> > + * @mtd: pointer to new MTD device info structure
> > + * @info: pairing information struct
> > + *
> > + * Returns a positive number representing the wunit associated to the info
> > + * struct, or a negative error code.
> > + *
> > + * This is the reverse of mtd_wunit_to_pairing_info(), and can help one to
> > + * iterate over all wunits of a given pair (see mtd_wunit_to_pairing_info()
> > + * doc).
> > + *
> > + * It can also be used to only program the first page of each pair (i.e.
> > + * page attached to group 0), which allows one to use an MLC NAND in
> > + * software-emulated SLC mode:
> > + *
> > + * info.group = 0;
> > + * for (info.pair = 0; info < mtd_wunit_per_eb(mtd); info.pair++) {  
> 
> (I know it's just example code, but...) the second clause should have
> 'info.pair < ...', not 'info < ...'.

I'll fix the example.

> 
> > + *	wunit = mtd_pairing_info_to_wunit(mtd, &info);
> > + *	mtd_write(mtd, mtd_wunit_to_offset(mtd, blkoffs, wunit),
> > + *		  mtd->writesize, &retlen, buf + (i * mtd->writesize));
> > + * }
> > + */
> > +int mtd_pairing_info_to_wunit(struct mtd_info *mtd,
> > +			      const struct mtd_pairing_info *info)
> > +{  
> 
> Any range checking on info->group or info->pair? What about
> NULL-checking 'info'?

I'll add those checks.

> 
> > +	if (!mtd->pairing || !mtd->pairing->get_info) {
> > +		if (info->group)
> > +			return -EINVAL;
> > +
> > +		return info->pair;
> > +	}
> > +
> > +	return mtd->pairing->get_wunit(mtd, info);
> > +}
> > +EXPORT_SYMBOL_GPL(mtd_pairing_info_to_wunit);
> > +
> > +/**
> > + * mtd_pairing_groups - get the number of pairing groups
> > + * @mtd: pointer to new MTD device info structure
> > + *
> > + * Returns the number of pairing groups.
> > + *
> > + * This number is usually equal to the number of bits exposed by a single
> > + * cell, and can be used in conjunction with mtd_pairing_info_to_wunit()
> > + * to iterate over all pages of a given pair.
> > + */
> > +int mtd_pairing_groups(struct mtd_info *mtd)
> > +{
> > +	if (!mtd->pairing || !mtd->pairing->ngroups)
> > +		return 1;
> > +
> > +	return mtd->pairing->ngroups;
> > +}
> > +EXPORT_SYMBOL_GPL(mtd_pairing_groups);
> > +
> > +/**
> >   *	add_mtd_device - register an MTD device
> >   *	@mtd: pointer to new MTD device info structure
> >   *
> > diff --git a/drivers/mtd/mtdpart.c b/drivers/mtd/mtdpart.c
> > index 1f13e32556f8..e32a0ac2298f 100644
> > --- a/drivers/mtd/mtdpart.c
> > +++ b/drivers/mtd/mtdpart.c
> > @@ -397,6 +397,7 @@ static struct mtd_part *allocate_partition(struct mtd_info *master,
> >  	slave->mtd.oobsize = master->oobsize;
> >  	slave->mtd.oobavail = master->oobavail;
> >  	slave->mtd.subpage_sft = master->subpage_sft;
> > +	slave->mtd.pairing = master->pairing;
> >  
> >  	slave->mtd.name = name;
> >  	slave->mtd.owner = master->owner;
> > diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h
> > index 29a170612203..00bcacb16176 100644
> > --- a/include/linux/mtd/mtd.h
> > +++ b/include/linux/mtd/mtd.h
> > @@ -127,6 +127,81 @@ struct mtd_ooblayout_ops {
> >  		    struct mtd_oob_region *oobfree);
> >  };
> >  
> > +/**
> > + * struct mtd_pairing_info - page pairing information
> > + *
> > + * @pair: pair id
> > + * @group: group id
> > + *
> > + * The pair word is used here, even though TLC NANDs might group pages by 3  
> 
> Nit: "The pair word is used" is somewhat confusing on first read, IMO. I
> think maybe it's partly the ordering of the words, as well as the use
> "word" which has different technical meaning sometime... Maybe one of
> the following?
> 
>   The word "pair" is used here ...
>   The term "pair" is used here ...
> 
> (Sorry, very nitpicky.)

No problem :), I'll pick one of those.

> 
> > + * (3 bits in a single cell). A pair should regroup all pages that are sharing
> > + * the same cell. Pairs are then indexed in ascending order.
> > + *
> > + * @group is defining the position of a page in a given pair. It can also be
> > + * seen as the bit position in the cell: page attached to bit 0 belongs to
> > + * group 0, page attached to bit 1 belongs to group 1, etc.
> > + *
> > + * Example:
> > + * The H27UCG8T2BTR-BC datasheet describes the following pairing scheme:
> > + *
> > + *		group-0		group-1
> > + *
> > + *  pair-0	page-0		page-4
> > + *  pair-1	page-1		page-5
> > + *  pair-2	page-2		page-8
> > + *  ...
> > + *  pair-127	page-251	page-255
> > + *
> > + *
> > + * Note that the "group" and "pair" terms were extracted from Samsung and
> > + * Hynix datasheets, and might be referenced under other names in other
> > + * datasheets (Micron is describing this concept as "shared pages").  
> 
> Very, very helpful (to me, even though I'm moderately familiar with the
> concepts, but hopefully moreso for others who want to read and
> understand this). Thanks for writing this up.

Actually, the more I think about it, the more I doubt those terms are
appropriate (even if they are widely used in technical documents).

How about using the following names instead:

struct mtd_cell_sharing_scheme {
	...
};

struct mtd_cell_sharing_info {
	/* the bit position in the cell */
	int bitpos;
	/*
	 * What was previously known as 'pair': an id representing a
	 * group of cells forming a 'pair of pages'.
	 * I can't find a good description/word for this concept. Do
	 * you have better ideas?
	 */
	int group;
};

What do you think?

> 
> > + */
> > +struct mtd_pairing_info {
> > +	int pair;
> > +	int group;
> > +};
> > +
> > +/**
> > + * struct mtd_pairing_scheme - page pairing scheme description
> > + *
> > + * @ngroups: number of groups. Should be related to the number of bits
> > + *	     per cell.
> > + * @get_info: converts a write-unit (page number within an erase block) into
> > + *	      mtd_pairing information (pair + group). This function should
> > + *	      fill the info parameter based on the wunit index.
> > + * @get_wunit: converts pairing information into a write-unit (page) number.
> > + *	       This function should return the wunit index pointed by the
> > + *	       pairing information described in the info argument. It should
> > + *	       return -EINVAL, if there's no wunit corresponding to the
> > + *	       passed pairing information.
> > + *
> > + * See mtd_pairing_info documentation for a detailed explanation of the
> > + * pair and group concepts.
> > + *
> > + * The mtd_pairing_scheme structure provides a generic solution to represent
> > + * NAND page pairing scheme. Instead of exposing two big tables to do the
> > + * write-unit <-> (pair + group) conversions, we ask the MTD drivers to
> > + * implement the ->get_info() and ->get_wunit() functions.
> > + *
> > + * MTD users will then be able to query these information by using the
> > + * mtd_pairing_info_to_wunit() and mtd_wunit_to_pairing_info() helpers.
> > + *
> > + * @ngroups is here to help MTD users iterating over all the pages in a
> > + * given pair. This value can be retrieved by MTD users using the
> > + * mtd_pairing_groups() helper.
> > + *
> > + * Examples are given in the mtd_pairing_info_to_wunit() and
> > + * mtd_wunit_to_pairing_info() documentation.
> > + */
> > +struct mtd_pairing_scheme {
> > +	int ngroups;
> > +	void (*get_info)(struct mtd_info *mtd, int wunit,
> > +			 struct mtd_pairing_info *info);
> > +	int (*get_wunit)(struct mtd_info *mtd,
> > +			 const struct mtd_pairing_info *info);  
> 
> Wait, I noted above that get_info() doesn't return errors (and that's
> OK, if we do bounds checking in mtdcore), but why does get_wunit(),
> then? From the looks of it, you don't actually do any bounds checking in
> the implementations in patch 2, right? And couldn't we do any checking
> in the mtdcore.c helper anyway?
> 
> Unless I'm misunderstanding something, I think we should have both
> return errors, or neither.

I agree. ->get-info() could fill mtd_pairing_info to reflect an error,
but that's confusing. Let's switch to functions returning int and patch
the implementation to do bounds checking. 

> 
> > +};
> > +
> >  struct module;	/* only needed for owner field in mtd_info */
> >  
> >  struct mtd_info {
> > @@ -188,6 +263,9 @@ struct mtd_info {
> >  	/* OOB layout description */
> >  	const struct mtd_ooblayout_ops *ooblayout;
> >  
> > +	/* NAND pairing scheme, only provided for MLC/TLC NANDs */
> > +	const struct mtd_pairing_scheme *pairing;
> > +
> >  	/* the ecc step size. */
> >  	unsigned int ecc_step_size;
> >  
> > @@ -296,6 +374,12 @@ static inline void mtd_set_ooblayout(struct mtd_info *mtd,
> >  	mtd->ooblayout = ooblayout;
> >  }
> >  
> > +static inline void mtd_set_pairing_scheme(struct mtd_info *mtd,
> > +				const struct mtd_pairing_scheme *pairing)
> > +{
> > +	mtd->pairing = pairing;
> > +}
> > +
> >  static inline void mtd_set_of_node(struct mtd_info *mtd,
> >  				   struct device_node *np)
> >  {
> > @@ -312,6 +396,11 @@ static inline int mtd_oobavail(struct mtd_info *mtd, struct mtd_oob_ops *ops)
> >  	return ops->mode == MTD_OPS_AUTO_OOB ? mtd->oobavail : mtd->oobsize;
> >  }
> >  
> > +void mtd_wunit_to_pairing_info(struct mtd_info *mtd, int wunit,
> > +			       struct mtd_pairing_info *info);
> > +int mtd_pairing_info_to_wunit(struct mtd_info *mtd,
> > +			      const struct mtd_pairing_info *info);
> > +int mtd_pairing_groups(struct mtd_info *mtd);
> >  int mtd_erase(struct mtd_info *mtd, struct erase_info *instr);
> >  int mtd_point(struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen,
> >  	      void **virt, resource_size_t *phys);
> > @@ -397,6 +486,23 @@ static inline uint32_t mtd_mod_by_ws(uint64_t sz, struct mtd_info *mtd)
> >  	return do_div(sz, mtd->writesize);
> >  }
> >  
> > +static inline int mtd_wunit_per_eb(struct mtd_info *mtd)
> > +{
> > +	return mtd->erasesize / mtd->writesize;
> > +}
> > +
> > +static inline int mtd_offset_to_wunit(struct mtd_info *mtd, loff_t offs)
> > +{
> > +	return mtd_div_by_ws(mtd_mod_by_eb(offs, mtd), mtd);
> > +}
> > +
> > +static inline loff_t mtd_wunit_to_offset(struct mtd_info *mtd, loff_t base,
> > +					 int wunit)
> > +{
> > +	return base + (wunit * mtd->writesize);
> > +}
> > +
> > +
> >  static inline int mtd_has_oob(const struct mtd_info *mtd)
> >  {
> >  	return mtd->_read_oob && mtd->_write_oob;  
> 
> With the above addressed:
> 
> Reviewed-by: Brian Norris <computersforpeace at gmail.com>

Thanks for the review!

Regards,

Boris