[RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir

Yu Kuai yukuai1 at huaweicloud.com
Fri Nov 15 23:22:09 PST 2024


Hi,

在 2024/11/13 23:17, Chuck Lever 写道:
> On Mon, Nov 11, 2024 at 11:20:17PM +0800, yangerkun wrote:
>>
>>
>> 在 2024/11/11 22:39, Chuck Lever III 写道:
>>>
>>>
>>>> On Nov 10, 2024, at 9:36 PM, Yu Kuai <yukuai1 at huaweicloud.com> wrote:
>>>> I'm in the cc list ,so I assume you saw my set, then I don't know why
>>>> you're ignoring my concerns.
>>>> 1) next_offset is 32-bit and can overflow in a long-time running
>>>> machine.
>>>> 2) Once next_offset overflows, readdir will skip the files that offset
>>>> is bigger.
>>
>> I'm sorry, I'm a little busy these days, so I haven't responded to this
>> series of emails.
>>
>>> In that case, that entry won't be visible via getdents(3)
>>> until the directory is re-opened or the process does an
>>> lseek(fd, 0, SEEK_SET).
>>
>> Yes.
>>
>>>
>>> That is the proper and expected behavior. I suspect you
>>> will see exactly that behavior with ext4 and 32-bit
>>> directory offsets, for example.
>>
>> Emm...
>>
>> For this case like this:
>>
>> 1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2
>> 2. open /tmp/dir with fd1
>> 3. readdir and get /tmp/dir/file1
>> 4. rm /tmp/dir/file2
>> 5. touch /tmp/dir/file2
>> 4. loop 4~5 for 2^32 times
>> 5. readdir /tmp/dir with fd1
>>
>> For tmpfs now, we may see no /tmp/dir/file2, since the offset has been
>> overflow, for ext4 it is ok... So we think this will be a problem.
> 
> I constructed a simple test program using the above steps:
> 
> /*
>   * 1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2
>   * 2. open /tmp/dir with fd1
>   * 3. readdir and get /tmp/dir/file1
>   * 4. rm /tmp/dir/file2
>   * 5. touch /tmp/dir/file2
>   * 6. loop 4~5 for 2^32 times
>   * 7. readdir /tmp/dir with fd1
>   */
> 
> #include <sys/types.h>
> #include <sys/stat.h>
> 
> #include <dirent.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <stdbool.h>
> #include <stdio.h>
> #include <string.h>
> 
> static void list_directory(DIR *dirp)
> {
> 	struct dirent *de;
> 
> 	errno = 0;
> 	do {
> 		de = readdir(dirp);
> 		if (!de)
> 			break;
> 
> 		printf("d_off:  %lld\n", de->d_off);
> 		printf("d_name: %s\n", de->d_name);
> 	} while (true);
> 
> 	if (errno)
> 		perror("readdir");
> 	else
> 		printf("EOD\n");
> }
> 
> int main(int argc, char **argv)
> {
> 	unsigned long i;
> 	DIR *dirp;
> 	int ret;
> 
> 	/* 1. */
> 	ret = mkdir("/tmp/dir", 0755);
> 	if (ret < 0) {
> 		perror("mkdir");
> 		return 1;
> 	}
> 
> 	ret = creat("/tmp/dir/file1", 0644);
> 	if (ret < 0) {
> 		perror("creat");
> 		return 1;
> 	}
> 	close(ret);
> 
> 	ret = creat("/tmp/dir/file2", 0644);
> 	if (ret < 0) {
> 		perror("creat");
> 		return 1;
> 	}
> 	close(ret);
> 
> 	/* 2. */
> 	errno = 0;
> 	dirp = opendir("/tmp/dir");
> 	if (!dirp) {
> 		if (errno)
> 			perror("opendir");
> 		else
> 			fprintf(stderr, "EOD\n");
> 		closedir(dirp);
> 		return 1;
> 	}
> 
> 	/* 3. */
> 	errno = 0;
> 	do {
> 		struct dirent *de;
> 
> 		de = readdir(dirp);
> 		if (!de) {
> 			if (errno) {
> 				perror("readdir");
> 				closedir(dirp);
> 				return 1;
> 			}
> 			break;
> 		}
> 		if (strcmp(de->d_name, "file1") == 0) {
> 			printf("Found 'file1'\n");
> 			break;
> 		}
> 	} while (true);
> 
> 	/* run the test. */
> 	for (i = 0; i < 10000; i++) {
> 		/* 4. */
> 		ret = unlink("/tmp/dir/file2");
> 		if (ret < 0) {
> 			perror("unlink");
> 			closedir(dirp);
> 			return 1;
> 		}
> 
> 		/* 5. */
> 		ret = creat("/tmp/dir/file2", 0644);
> 		if (ret < 0) {
> 			perror("creat");
> 			fprintf(stderr, "i = %lu\n", i);
> 			closedir(dirp);
> 			return 1;
> 		}
> 		close(ret);
> 	}
> 
> 	/* 7. */
> 	printf("\ndirectory after test:\n");
> 	list_directory(dirp);
> 
> 	/* cel. */
> 	rewinddir(dirp);
> 	printf("\ndirectory after rewind:\n");
> 	list_directory(dirp);
> 
> 	closedir(dirp);
> 	return 0;
> }
> 
> 
>>> Does that not directly address your concern? Or do you
>>> mean that Erkun's patch introduces a new issue?
>>
>> Yes, to be honest, my personal feeling is a problem. But for 64bit, it may
>> never been trigger.
> 
> I ran the test program above on this kernel:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=nfsd-testing
> 
> Note that it has a patch to restrict the range of directory offset
> values for tmpfs to 2..4096.
> 
> I did not observe any unexpected behavior after the offset values
> wrapped. At step 7, I can always see file2, and its offset is always
> 4. At step "cel" I can see all expected directory entries.

Then, do you investigate more or not?
> 
> I tested on v6.12-rc7 with the same range restriction but using
> Maple tree and 64-bit offsets. No unexpected behavior there either.
> 
> So either we're still missing something, or there is no problem. My
> only theory is maybe it's an issue with an implicit integer sign
> conversion, and we should restrict the offset range to 2..S32_MAX.
> 
> I can try testing with a range of (U32_MAX - 4096)..(U32_MAX).

You can try the following reproducer, it's much simpler. First, apply
following patch(on latest kernel):

diff --git a/fs/libfs.c b/fs/libfs.c
index a168ece5cc61..7c1a5982a0c8 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -291,7 +291,7 @@ int simple_offset_add(struct offset_ctx *octx, 
struct dentry *dentry)
                 return -EBUSY;

         ret = mtree_alloc_cyclic(&octx->mt, &offset, dentry, 
DIR_OFFSET_MIN,
-                                LONG_MAX, &octx->next_offset, GFP_KERNEL);
+                                256, &octx->next_offset, GFP_KERNEL);
         if (ret < 0)
                 return ret;


Then, create a new tmpfs dir, inside the dir:

[root at fedora test-libfs]# for ((i=0; i<256; ++i)); do touch $i; done
touch: cannot touch '255': Device or resource busy
[root at fedora test-libfs]# ls
0    103  109  114  12   125  130  136  141  147  152  158  163  169 
174  18   185  190  196  200  206  211  217  222  228  233  239  244  25 
   26  31  37  42  48  53  59  64  7   75  80  86  91  97
1    104  11   115  120  126  131  137  142  148  153  159  164  17 
175  180  186  191  197  201  207  212  218  223  229  234  24   245 
250  27  32  38  43  49  54  6   65  70  76  81  87  92  98
10   105  110  116  121  127  132  138  143  149  154  16   165  170 
176  181  187  192  198  202  208  213  219  224  23   235  240  246 
251  28  33  39  44  5   55  60  66  71  77  82  88  93  99
100  106  111  117  122  128  133  139  144  15   155  160  166  171 
177  182  188  193  199  203  209  214  22   225  230  236  241  247 
252  29  34  4   45  50  56  61  67  72  78  83  89  94
101  107  112  118  123  129  134  14   145  150  156  161  167  172 
178  183  189  194  2    204  21   215  220  226  231  237  242  248 
253  3   35  40  46  51  57  62  68  73  79  84  9   95
102  108  113  119  124  13   135  140  146  151  157  162  168  173 
179  184  19   195  20   205  210  216  221  227  232  238  243  249 
254  30  36  41  47  52  58  63  69  74  8   85  90  96
[root at fedora test-libfs]# rm -f 0
[root at fedora test-libfs]# touch 255
[root at fedora test-libfs]# ls
255
[root at fedora test-libfs]#

I don't think I have to explain why the second ls can only show the file
255...

Thanks,
Kuai

> 
> 
>>> If there is a problem here, please construct a reproducer
>>> against this patch set and post it.
> 
> Invitation still stands: if you have a solid reproducer, please post
> it.
> 
> 




More information about the maple-tree mailing list