[axboe-block:for-next] [block] 1122c0c1cc: aim7.jobs-per-min 22.6% improvement

Oliver Sang oliver.sang at intel.com
Tue Jun 25 19:10:49 PDT 2024


hi, Christoph Hellwig,

On Tue, Jun 25, 2024 at 01:57:35AM -0700, Christoph Hellwig wrote:
> Hi Oliver,
> 
> can you test the patch below?  It restores the previous behavior if
> the device did not have a volatile write cache.  I think at least
> for raid0 and raid1 without bitmap the new behavior actually is correct
> and better, but it will need fixes for other modes.  If the underlying
> devices did have a volatile write cache I'm a bit lost what the problem
> was and this probably won't fix the issue.

I'm not sure I understand this test request. as in title, we see a good
improvement of aim7 for 1122c0c1cc, and we didn't observe other issues for
this commit.

do you mean this improvement is not expected or exposes some problems instead?
then by below patch, should the performance back to the level of parent of
1122c0c1cc?

sure! it's our great pleasure to test your patches. I noticed there are
[1]
https://lore.kernel.org/all/20240625110603.50885-2-hch@lst.de/
which includes "[PATCH 1/7] md: set md-specific flags for all queue limits"
[2]
https://lore.kernel.org/all/20240625145955.115252-2-hch@lst.de/
which includes "[PATCH 1/8] md: set md-specific flags for all queue limits"

which one you suggest us to test?
do we only need to apply the first patch "md: set md-specific flags for all queue limits"
upon 1122c0c1cc?
then is the expectation the performance back to parent of 1122c0c1cc?

thanks

> 
> ---
> From 81c816827197f811e14add7a79220ed9eef6af02 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch at lst.de>
> Date: Tue, 25 Jun 2024 08:48:18 +0200
> Subject: md: set md-specific flags for all queue limits
> 
> The md driver wants to enforce a number of flags to an all devices, even
> when not inheriting them from the underlying devices.  To make sure these
> flags survive the queue_limits_set calls that md uses to update the
> queue limits without deriving them form the previous limits add a new
> md_init_stacking_limits helper that calls blk_set_stacking_limits and sets
> these flags.
> 
> Fixes: 1122c0c1cc71 ("block: move cache control settings out of queue->flags")
> Signed-off-by: Christoph Hellwig <hch at lst.de>
> ---
>  drivers/md/md.c     | 13 ++++++++-----
>  drivers/md/md.h     |  1 +
>  drivers/md/raid0.c  |  2 +-
>  drivers/md/raid1.c  |  2 +-
>  drivers/md/raid10.c |  2 +-
>  drivers/md/raid5.c  |  2 +-
>  6 files changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 69ea54aedd99a1..8368438e58e989 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -5853,6 +5853,13 @@ static void mddev_delayed_delete(struct work_struct *ws)
>  	kobject_put(&mddev->kobj);
>  }
>  
> +void md_init_stacking_limits(struct queue_limits *lim)
> +{
> +	blk_set_stacking_limits(lim);
> +	lim->features = BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA |
> +			BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT;
> +}
> +
>  struct mddev *md_alloc(dev_t dev, char *name)
>  {
>  	/*
> @@ -5871,10 +5878,6 @@ struct mddev *md_alloc(dev_t dev, char *name)
>  	int shift;
>  	int unit;
>  	int error;
> -	struct queue_limits lim = {
> -		.features		= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA |
> -					  BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT,
> -	};
>  
>  	/*
>  	 * Wait for any previous instance of this device to be completely
> @@ -5914,7 +5917,7 @@ struct mddev *md_alloc(dev_t dev, char *name)
>  		 */
>  		mddev->hold_active = UNTIL_STOP;
>  
> -	disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
> +	disk = blk_alloc_disk(NULL, NUMA_NO_NODE);
>  	if (IS_ERR(disk)) {
>  		error = PTR_ERR(disk);
>  		goto out_free_mddev;
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index c4d7ebf9587d07..28cb4b0b6c1740 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -893,6 +893,7 @@ extern int strict_strtoul_scaled(const char *cp, unsigned long *res, int scale);
>  
>  extern int mddev_init(struct mddev *mddev);
>  extern void mddev_destroy(struct mddev *mddev);
> +void md_init_stacking_limits(struct queue_limits *lim);
>  struct mddev *md_alloc(dev_t dev, char *name);
>  void mddev_put(struct mddev *mddev);
>  extern int md_run(struct mddev *mddev);
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 62634e2a33bd0f..32d58752477847 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -379,7 +379,7 @@ static int raid0_set_limits(struct mddev *mddev)
>  	struct queue_limits lim;
>  	int err;
>  
> -	blk_set_stacking_limits(&lim);
> +	md_init_stacking_limits(&lim);
>  	lim.max_hw_sectors = mddev->chunk_sectors;
>  	lim.max_write_zeroes_sectors = mddev->chunk_sectors;
>  	lim.io_min = mddev->chunk_sectors << 9;
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 1a0eba65b8a92b..04a0c2ca173245 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -3194,7 +3194,7 @@ static int raid1_set_limits(struct mddev *mddev)
>  	struct queue_limits lim;
>  	int err;
>  
> -	blk_set_stacking_limits(&lim);
> +	md_init_stacking_limits(&lim);
>  	lim.max_write_zeroes_sectors = 0;
>  	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
>  	if (err) {
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 3334aa803c8380..2a9c4ee982e023 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -3974,7 +3974,7 @@ static int raid10_set_queue_limits(struct mddev *mddev)
>  	struct queue_limits lim;
>  	int err;
>  
> -	blk_set_stacking_limits(&lim);
> +	md_init_stacking_limits(&lim);
>  	lim.max_write_zeroes_sectors = 0;
>  	lim.io_min = mddev->chunk_sectors << 9;
>  	lim.io_opt = lim.io_min * raid10_nr_stripes(conf);
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 0192a6323f09ba..10219205160bbf 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -7708,7 +7708,7 @@ static int raid5_set_limits(struct mddev *mddev)
>  	 */
>  	stripe = roundup_pow_of_two(data_disks * (mddev->chunk_sectors << 9));
>  
> -	blk_set_stacking_limits(&lim);
> +	md_init_stacking_limits(&lim);
>  	lim.io_min = mddev->chunk_sectors << 9;
>  	lim.io_opt = lim.io_min * (conf->raid_disks - conf->max_degraded);
>  	lim.features |= BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE;
> -- 
> 2.43.0
> 



More information about the Linux-nvme mailing list