[RFC/PATCH 0/1] ARM: Handle user space mapped pages in flush_kernel_dcache_page

Simon Baatz gmbnomis at gmail.com
Sun Jun 17 15:13:05 EDT 2012


Ping. Although the problem affects only VIVT (and probably VIPT
aliasing) architectures and only direct I/O on a small fraction of
device drivers (those using flush_kernel_dcache_page()), it results
in serious data corruption in these cases.  LVM2 is affected by this
and does not work.  I think we should fix that one way or the other.

- Simon

On Mon, May 28, 2012 at 01:11:57AM +0200, Simon Baatz wrote:
> Patch f8b63c1 "ARM: 6382/1" changed flush_kernel_dcache_page() to a
> no-op. flush_kernel_dcache_page was deemed superfluous as newly
> allocated page cache pages are now dirty by default (patch 6379/1).
> 
> This relies on the assumption that a block device driver only reads
> into new pages (otherwise the request is served from the cache).
> 
> However, this assumption is not true for direct IO or SG_IO (see
> e.g. [1] and [2]). This is why vgchange fails with mv_cesa as reported
> by me in [3]. (Btw., flush_kernel_dcache_page is also called from
> "copy_strings" in fs/exec.c when copying env/arg strings between
> processes. Thus, its use is not restricted to device drivers.)
> 
> At the end of this mail is a small program that uses O_DIRECT to
> specifically create two test cases:
> - device driver gets an anonymous page to read into
> - device driver gets a page with user mapping(s) to read into
> 
> The source can also be obtained from [4]. With mv_cesa, both cases
> return garbled data (the scatterlist memory iterator uses
> flush_kernel_dcache_page).
> 
> As explained above, the problem is not specific to mv_cesa. As many
> drivers seem to use flush_dcache_page where they probably could use
> flush_kernel_dcache_page, the problem did not surface up to now. For
> example, one of these uses is in the scatterwalk implementation of
> linux crypto. When changing flush_dcache_page to
> flush_kernel_dcache_page in crypto/scatterwalk.c, I see the same
> coherency problems:
> 
> root at ww1:~# rmmod mv_cesa
> root at ww1:~# cryptsetup luksOpen /dev/sda2 c_sda2
> Enter passphrase for /dev/sda2: 
> root at ww1:~# ./mapping_tests 
> Anonymous page: differs!
> User space mapped page: differs!
> 
> The proposed fix is to let flush_kernel_dcache_page handle the 'kernel
> mapped only' case as before (no-op) but to flush the kernel mapping in
> case of a user space mapped page.
> 
> This means that the original idea to 'fix' non-flushing PIO drivers by
> patch 6379/1 does not work for all cases. The provided patch should
> fix things for drivers that already perform cache flushing, but it
> obviously does not fix direct IO for PIO drivers without (I think
> there have been discussions for specific PIO mapping APIs in the
> past). This and my limited testing on only one architecture (kirkwood,
> VIVT) is why the patch is RFC.
> 
> - Simon
> 
> [1] https://lkml.org/lkml/2008/11/20/20
> [2] http://article.gmane.org/gmane.linux.kernel.cross-arch/5159
> [3] http://www.mail-archive.com/linux-crypto@vger.kernel.org/msg07038.html
> [4] http://ubuntuone.com/0jF9fNsguN8BvI1Z8fsBVI
> 
> 
> mapping_tests.c:
> 
> #define _GNU_SOURCE
> 
> #include <fcntl.h>
> #include <stdio.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include <sys/mman.h>
> #include <string.h>
> #include <assert.h>
> 
> const char* read_filename = "/dev/mapper/c_sda2";
> const char* map_filename = "./map.out";
> 
> /* Init the page and read it to fill some cache lines */
> int wr_page(char *p, int ps) {
> 	int i, sum;
> 
> 	memset(p, 9, ps);
> 	sum = 0;
> 	for(i = 0; i < ps; i++)
> 		sum += p[i];
> 	return sum;
> }
> 
> int main(void) {
> 	void *org_buffer;
> 	void *odirect_buffer; 
> 	char *mbuffer;
> 	int fd, fdm;
> 	int i, c;
> 	unsigned page_size;
> 
> 	page_size = getpagesize();
> 
> 	/* Read the first page into a normal buffer.
> 	   If the page is not in the page cache yet, this will
> 	   create a new page without user mappings in the kernel.
> 	 */
> 	fd = open(read_filename, O_RDONLY);
> 	assert(fd >= 0);
> 	org_buffer = malloc(page_size);
> 	c = read(fd, org_buffer, page_size);
> 	assert(c == page_size);
> 	close(fd);
> 
> 	/* Read the first page into an aligned malloced buffer using
> 	   O_DIRECT.  The block driver will get an anonymous page.
> 	 */
> 	odirect_buffer = NULL;
> 	posix_memalign(&odirect_buffer, page_size, page_size);
> 	assert(odirect_buffer);
>   
> 	fd = open(read_filename, O_RDONLY | O_DIRECT);
> 	assert(fd >= 0);
> 
> 	for (i = 0 ; i < 100 ; i++) {
> 		wr_page((char *)odirect_buffer, page_size);
> 		lseek(fd, 0, SEEK_SET);
> 		c = read(fd, odirect_buffer, page_size);
> 		assert(c == page_size);
> 		if (memcmp(odirect_buffer, org_buffer, page_size) != 0) {
> 			printf("Anonymous page: differs!\n");
> 			break;
> 		}
> 	}
> 	close(fd);
> 	free(odirect_buffer);
> 
> 	/* Read the first page into an mmapped file (using O_DIRECT).
> 	   The block driver will get a file backed page with user
> 	   mappings.
> 	 */
> 	fd = open(read_filename, O_RDONLY | O_DIRECT);
> 	assert(fd >= 0);
> 	fdm = open(map_filename, O_RDWR | O_CREAT | O_TRUNC);
> 	assert(fdm >= 0);
> 	c = write(fdm, org_buffer, page_size);
> 	assert(c == page_size);
> 	c = fsync(fdm);
> 	assert(c == 0);
> 	mbuffer = mmap(NULL, page_size, PROT_WRITE, MAP_SHARED, fdm, 0);
> 	for (i = 0 ; i < 100 ; i++) {
> 		wr_page((char *)mbuffer, page_size);
> 		lseek(fd, 0, SEEK_SET);
> 		c = read(fd, mbuffer, page_size);
> 		assert(c == page_size);
> 		if (memcmp(mbuffer, org_buffer, page_size) != 0) {
> 			printf("User space mapped page: differs!\n");
> 			break;
> 		}
> 	}
> 	munmap(mbuffer, page_size);
> 	close(fdm);
> 	close(fd);
> 	free(org_buffer);
> }
> 
> 
> Simon Baatz (1):
>   ARM: Handle user space mapped pages in flush_kernel_dcache_page
> 
>  arch/arm/include/asm/cacheflush.h |    4 ++++
>  arch/arm/mm/flush.c               |   22 ++++++++++++++++++++++
>  2 files changed, 26 insertions(+)
> 
> -- 
> 1.7.9.5

-- 
Simon Baatz <bmail at schnuecks.de>



More information about the linux-arm-kernel mailing list