ubifs: absurdly large directory inode size (possibly race condition / underflow)

Fri Sep 22 20:36:38 PDT 2023

在 2023/9/20 23:09, Roland Ruckerbauer 写道:
> Greetings,
> 
> I have observed some strange behavior in a UBIFS filesystem, and I wanted to ask if this is known, or unexpected.
> For reference, I am using latest upstream 4.19 kernel on an embedded system, with the filesystem in question being encrypted with fscrypt.
> 
> When I stat the directory in question it shows the following:
> 
> File: ./datastorage
>    Size: 18446744073709550408    Blocks: 0          IO Block: 4096   directory
> Device: 27h/39d Inode: 1168        Links: 2
> Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2023-09-19 13:30:21.000000000
> Modify: 2023-09-20 14:44:09.000000000
> Change: 2023-09-20 14:44:09.000000000
> 
> As you can see, the size of the directory inode is absurdly large. Its actually quite close to 2^64, which makes me think there is some kind of
> race / underflow happening in regards to the stored inode size metadata.
> 
> Apart from this observation, the filesystem in question seems to be healthy, no errors, also no unexpected errors from the applications using it.
> As far as I am aware, the problem only manifests itself as corrupted metadata, when calling e.g. stat.
> When I move around the folder on the same filesystem, the problem persists. When any files are added / deleted from the directory, it changes the
> directory size, but its still corrupted and close to the same value. A reboot of the system shows that this corruption is indeed persistent on the
> filesystem.
> 
> This began to happen daily (but did happen before), since I did a small code change to an application running on the system.
> 
> Here is some pseudocode which I suspect might be related to the problem. Obviously I removed a lot of the error handling etc... to make it more clear.
> In essence I refactored some code to be more atomic with its changes to files.
> 
> --------------------------------------------------------------------------------------------------------
> // Open the file for writing, but make it an anon inode to prevent having incomplete
> // files in the filesystem when e.g. crashing
> int fd = open(path, O_WRONLY | O_CLOEXEC | O_TMPFILE, 0644);
> 
> ...
> 
> write(fd, buffer, buffer_len);
> 
> ...
> 
> // Unfortunately we need to unlink the destination first, because AT_LINK_REPLACE is not available
> // This is not atomic, so in theory someone can re-create the file after its deleted here.
> // I think its ok to let the write fail in this case.
> unlink(path);
> 
> linkat(AT_FDCWD, old_path, AT_FDCWD, path.c_str(), AT_SYMLINK_FOLLOW);
> 
> close(fd);
> --------------------------------------------------------------------------------------------------------
> 
> 
> Is someone aware of a problem like this? I did not find anything similar to this particular problem, despite searching for some time. Even for
> other filesystems, not just ubifs. I have expected this to work without issues, its not some rare patter to use O_TMPFILE like this. After
> all the linkat() and open() manpages both mention this approach.
> 
> Could it be that the unlink() followed by the linkat() is somehow resulting in a race condition in the kernel?
> 

Hi, see 
https://patchwork.ozlabs.org/project/linux-mtd/patch/20230923032859.3857274-1-chengzhihao1@huawei.com/