Synchronization of per-partition ecc_stats
Brian Norris
computersforpeace at gmail.com
Fri Jul 11 17:43:29 PDT 2014
+ Ezequiel, who's been looking at ecc_stats
Hi Daniel,
On Thu, Jul 10, 2014 at 10:08:16AM -0700, Daniel Ehrenberg wrote:
> On Tue, Jul 8, 2014 at 2:26 PM, Daniel Ehrenberg <dehrenberg at google.com> wrote:
> > I'm working on a patch to add some more counters to MTD for monitoring
> > write and erase errors. I was looking at the way ECC stats are
> > recorded, in particular how each partition gets its ecc_stats.failures
> > counter.
> >
> > Globally, for NAND, this counter is synchronized by
> > nand_get_device/nand_release_device. For individual partitions, it
> > looks like ecc_stats.failures is calculated by looking at how the
> > global stat changes across an operation sent down to the lower layer.
> > Here's the main place it happens:
> >
> > static int part_read(struct mtd_info *mtd, loff_t from, size_t len,
> > size_t *retlen, u_char *buf)
> > {
> > struct mtd_part *part = PART(mtd);
> > struct mtd_ecc_stats stats;
> > int res;
> >
> > stats = part->master->ecc_stats;
> > res = part->master->_read(part->master, from + part->offset, len,
> > retlen, buf);
> > if (unlikely(mtd_is_eccerr(res)))
> > mtd->ecc_stats.failed +=
> > part->master->ecc_stats.failed - stats.failed;
> > else
> > mtd->ecc_stats.corrected +=
> > part->master->ecc_stats.corrected - stats.corrected;
> > return res;
> > }
> >
> > I'm having trouble seeing how the MTD code prevents a race: it seems
> > to me that if a read to either the same or different partition occurs
> > and also changes master->ecc_stats, the stats could be double counted.
> > Is there any synchronization mechanism that I'm missing?
Your analysis looks correct to me. The ecc_stats locking looks pretty
sloppy (practically non-existent).
I think there is also potential for mild races with ioctl(ECCGETSTATS).
Since it is read-only, this simply means that we could have atomicity
issues (for instance, if we have a machine with non-atomic 32-bit memory
accesses). And now we'll have one more, with Ezequiel's patches to
expose ecc_stats in sysfs.
FWIW, I noticed a related issue when dealing entirely with user-space
mtdchar access:
http://lists.infradead.org/pipermail/linux-mtd/2014-February/052062.html
On a related note, I think 'master' vs. 'partition' ecc_stats are broken
in a few cases, where low-level "mark block bad" functions might be
called, updating master->ecc_stats.badblocks, but not
partition->ecc_stats.badblocks.
Patches are welcome!
Regards,
Brian
More information about the linux-mtd
mailing list