[PATCH 3/6] fs: distinguish between user initiated freeze and kernel initiated freeze

Tue Jun 6 10:19:56 PDT 2023

On Thu, May 25, 2023 at 04:14:30PM +0200, Jan Kara wrote:
> On Mon 22-05-23 16:42:00, Darrick J. Wong wrote:
> > How about this as an alternative patch?  Kernel and userspace freeze
> > state are stored in s_writers; each type cannot block the other (though
> > you still can't have nested kernel or userspace freezes); and the freeze
> > is maintained until /both/ freeze types are dropped.
> > 
> > AFAICT this should work for the two other usecases (quiescing pagefaults
> > for fsdax pmem pre-removal; and freezing fses during suspend) besides
> > online fsck for xfs.
> > 
> > --D
> > 
> > From: Darrick J. Wong <djwong at kernel.org>
> > Subject: fs: distinguish between user initiated freeze and kernel initiated freeze
> > 
> > Userspace can freeze a filesystem using the FIFREEZE ioctl or by
> > suspending the block device; this state persists until userspace thaws
> > the filesystem with the FITHAW ioctl or resuming the block device.
> > Since commit 18e9e5104fcd ("Introduce freeze_super and thaw_super for
> > the fsfreeze ioctl") we only allow the first freeze command to succeed.
> > 
> > The kernel may decide that it is necessary to freeze a filesystem for
> > its own internal purposes, such as suspends in progress, filesystem fsck
> > activities, or quiescing a device prior to removal.  Userspace thaw
> > commands must never break a kernel freeze, and kernel thaw commands
> > shouldn't undo userspace's freeze command.
> > 
> > Introduce a couple of freeze holder flags and wire it into the
> > sb_writers state.  One kernel and one userspace freeze are allowed to
> > coexist at the same time; the filesystem will not thaw until both are
> > lifted.
> > 
> > Inspired-by: Luis Chamberlain <mcgrof at kernel.org>
> > Signed-off-by: Darrick J. Wong <djwong at kernel.org>
> 
> Yes, this is exactly how I'd imagine it. Thanks for writing the patch!
> 
> I'd just note that this would need rebasing on top of Luis' patches 1 and
> 2. Also:

I started doing that, but I noticed that after patch 1, freeze_super no
longer leaves s_active elevated if the freeze is successful.  The
callers drop the s_active ref that they themselves obtained, which
means that we've now changed that behavior, right?  ioctl_fsfreeze now
does:

	if (!get_active_super(sb->s_bdev))
		return -ENOTTY;

(Increase ref)

        /* Freeze */
        if (sb->s_op->freeze_super)
		ret = sb->s_op->freeze_super(sb);
	ret = freeze_super(sb);

(Not sure why we can do both here?)

	deactivate_locked_super(sb);

(Decrease ref; net change to s_active is zero)

	return ret;

Luis hasn't responded to my question, so I stopped.

> > +	if (sbw->frozen == SB_FREEZE_COMPLETE) {
> > +		switch (who) {
> > +		case FREEZE_HOLDER_KERNEL:
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> > +				/*
> > +				 * Kernel freeze already in effect; caller can
> > +				 * try again.
> > +				 */
> > +				deactivate_locked_super(sb);
> > +				return -EBUSY;
> > +			}
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> > +				/*
> > +				 * Share the freeze state with the userspace
> > +				 * freeze already in effect.
> > +				 */
> > +				sbw->freeze_holders |= who;
> > +				deactivate_locked_super(sb);
> > +				return 0;
> > +			}
> > +			break;
> > +		case FREEZE_HOLDER_USERSPACE:
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_USERSPACE) {
> > +				/*
> > +				 * Userspace freeze already in effect; tell
> > +				 * the caller we're busy.
> > +				 */
> > +				deactivate_locked_super(sb);
> > +				return -EBUSY;
> > +			}
> > +			if (sbw->freeze_holders & FREEZE_HOLDER_KERNEL) {
> > +				/*
> > +				 * Share the freeze state with the kernel
> > +				 * freeze already in effect.
> > +				 */
> > +				sbw->freeze_holders |= who;
> > +				deactivate_locked_super(sb);
> > +				return 0;
> > +			}
> > +			break;
> > +		default:
> > +			BUG();
> > +			deactivate_locked_super(sb);
> > +			return -EINVAL;
> > +		}
> > +	}
> 
> Can't this be simplified to:
> 
> 	BUG_ON(who & ~(FREEZE_HOLDER_USERSPACE | FREEZE_HOLDER_KERNEL));
> 	BUG_ON(!(!(who & FREEZE_HOLDER_USERSPACE) ^
> 	       !(who & FREEZE_HOLDER_KERNEL)));
> retry:
> 	if (sb->s_writers.freeze_holders & who)
> 		return -EBUSY;
> 	/* Already frozen by someone else? */
> 	if (sb->s_writers.freeze_holders & ~who) {
> 		sb->s_writers.freeze_holders |= who;
> 		return 0;
> 	}

Yes, it can.

> Now the only remaining issue with the code is that the two different
> holders can be attempting to freeze the filesystem at once and in that case
> one of them has to wait for the other one instead of returning -EBUSY as
> would happen currently. This can happen because we temporarily drop
> s_umount in freeze_super() due to lock ordering issues. I think we could
> do something like:
> 
> 	if (!sb_unfrozen(sb)) {
> 		up_write(&sb->s_umount);
> 		wait_var_event(&sb->s_writers.frozen,
> 			       sb_unfrozen(sb) || sb_frozen(sb));
> 		down_write(&sb->s_umount);
> 		goto retry;
> 	}
> 
> and then sprinkle wake_up_var(&sb->s_writers.frozen) at appropriate places
> in freeze_super().

I think that'd work.  Let me try that.

> BTW, when reading this code, I've spotted attached cleanup opportunity but
> I'll queue that separately so that is JFYI.
> 
> > +#define FREEZE_HOLDER_USERSPACE	(1U << 1)	/* userspace froze fs */
> > +#define FREEZE_HOLDER_KERNEL	(1U << 2)	/* kernel froze fs */
> 
> Why not start from 1U << 0? And bonus points for using BIT() macro :).

I didn't think filesystem code was supposed to be using stuff from
vdso.h...

--D

> 								Honza
> -- 
> Jan Kara <jack at suse.com>
> SUSE Labs, CR

> From 9fce35f21f9a62470e764463c84373fb013108fd Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack at suse.cz>
> Date: Thu, 25 May 2023 15:56:19 +0200
> Subject: [PATCH] fs: Drop wait_unfrozen wait queue
> 
> wait_unfrozen waitqueue is used only in quota code to wait for
> filesystem to become unfrozen. In that place we can just use
> sb_start_write() - sb_end_write() pair to achieve the same. So just
> remove the waitqueue.
> 
> Signed-off-by: Jan Kara <jack at suse.cz>
> ---
>  fs/quota/quota.c   | 5 +++--
>  fs/super.c         | 4 ----
>  include/linux/fs.h | 1 -
>  3 files changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index 052f143e2e0e..0e41fb84060f 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -895,8 +895,9 @@ static struct super_block *quotactl_block(const char __user *special, int cmd)
>  			up_write(&sb->s_umount);
>  		else
>  			up_read(&sb->s_umount);
> -		wait_event(sb->s_writers.wait_unfrozen,
> -			   sb->s_writers.frozen == SB_UNFROZEN);
> +		/* Wait for sb to unfreeze */
> +		sb_start_write(sb);
> +		sb_end_write(sb);
>  		put_super(sb);
>  		goto retry;
>  	}
> diff --git a/fs/super.c b/fs/super.c
> index 34afe411cf2b..6283cea67280 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -236,7 +236,6 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
>  					&type->s_writers_key[i]))
>  			goto fail;
>  	}
> -	init_waitqueue_head(&s->s_writers.wait_unfrozen);
>  	s->s_bdi = &noop_backing_dev_info;
>  	s->s_flags = flags;
>  	if (s->s_user_ns != &init_user_ns)
> @@ -1706,7 +1705,6 @@ int freeze_super(struct super_block *sb)
>  	if (ret) {
>  		sb->s_writers.frozen = SB_UNFROZEN;
>  		sb_freeze_unlock(sb, SB_FREEZE_PAGEFAULT);
> -		wake_up(&sb->s_writers.wait_unfrozen);
>  		deactivate_locked_super(sb);
>  		return ret;
>  	}
> @@ -1722,7 +1720,6 @@ int freeze_super(struct super_block *sb)
>  				"VFS:Filesystem freeze failed\n");
>  			sb->s_writers.frozen = SB_UNFROZEN;
>  			sb_freeze_unlock(sb, SB_FREEZE_FS);
> -			wake_up(&sb->s_writers.wait_unfrozen);
>  			deactivate_locked_super(sb);
>  			return ret;
>  		}
> @@ -1768,7 +1765,6 @@ static int thaw_super_locked(struct super_block *sb)
>  	sb->s_writers.frozen = SB_UNFROZEN;
>  	sb_freeze_unlock(sb, SB_FREEZE_FS);
>  out:
> -	wake_up(&sb->s_writers.wait_unfrozen);
>  	deactivate_locked_super(sb);
>  	return 0;
>  }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 21a981680856..3b65a6194485 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1146,7 +1146,6 @@ enum {
>  
>  struct sb_writers {
>  	int				frozen;		/* Is sb frozen? */
> -	wait_queue_head_t		wait_unfrozen;	/* wait for thaw */
>  	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
>  };
>  
> -- 
> 2.35.3
>