[PATCH v2 2/2] mmc: mediatek: extend number of tuning steps
Axe Yang (杨磊)
Axe.Yang at mediatek.com
Mon Dec 4 17:23:21 PST 2023
On Fri, 2023-12-01 at 09:53 +0100, AngeloGioacchino Del Regno wrote:
> Il 29/11/23 04:16, Axe Yang (杨磊) ha scritto:
> > On Tue, 2023-11-28 at 11:20 +0100, AngeloGioacchino Del Regno
> > wrote:
> > > Il 28/11/23 10:38, Axe Yang (杨磊) ha scritto:
> > > > > On Tue, 2023-11-28 at 09:53 +0100, AngeloGioacchino Del Regno
> > > > > wrote:
> > > > > Il 28/11/23 08:01, Axe Yang ha scritto:
> > > > > > Previously, during the MSDC calibration process, a full
> > > > > > clock
> > > > > > cycle
> > > > > > actually not be covered, which in some cases didn't yield
> > > > > > the
> > > > > > best
> > > > > > results and could cause CRC errors. This problem is
> > > > > > particularly
> > > > > > evident when MSDC is used as an SDIO host. In fact, MSDC
> > > > > > support
> > > > > > tuning up to a maximum of 64 steps, but by default, the
> > > > > > step
> > > > > > number
> > > > > > is 32. By increase the tuning step, we are more likely to
> > > > > > cover
> > > > > > more
> > > > > > parts of a clock cycle, and get better calibration result.
> > > > > >
> > > > > > To illustrate, when tuning 32 steps, if the obtained window
> > > > > > has
> > > > > > a
> > > > > > hole
> > > > > > near the middle, like this: 0xffc07ff (hex), then the
> > > > > > selected
> > > > > > delay
> > > > > > will be the 6 (counting from right to left).
> > > > > >
> > > > > > (32 <- 1)
> > > > > > 1111 1111 1100 0000 0000 0111 11(1)1 1111
> > > > > >
> > > > > > However, if we tune 64 steps, the window obtained may look
> > > > > > like
> > > > > > this:
> > > > > > 0xfffffffffffc07ff. The final selected delay will be 44,
> > > > > > which
> > > > > > is
> > > > > > safer as it is further away from the hole:
> > > > > >
> > > > > > (64 <- 1)
> > > > > > 1111 ... (1)111 1111 1111 1111 1111 1100 0000 0000 0111
> > > > > > 1111
> > > > > > 1111
> > > > > >
> > > > > > In this case, delay 6 selected through 32 steps tuning is
> > > > > > obviously
> > > > > > not optimal, and this delay is closer to the hole, using it
> > > > > > would
> > > > > > easily cause CRC problems.
> > > > > >
> > > > > > You will need to configure property "mediatek,tuning-step"
> > > > > > in
> > > > > > MSDC
> > > > > > dts node to 64 to extend the steps.
> > > > > >
> > > > >
> > > > > If we can run 64 tuning steps, why should we run 32?
> > > > >
> > > > > Why isn't it just better to *always* run 64 tuning steps, on
> > > > > SoCs
> > > > > supporting that?
> > > > >
> > > > > Thanks,
> > > > > Angelo
> > > >
> > > > Hi Angelo,
> > > >
> > > > That is a good question. The benefit of preserving 32 steps
> > > > tuning
> > > > is
> > > > that it can save time in certain scenarios.
> > > >
> > > > On some platforms, when the delay selected through 64 steps
> > > > tuning
> > > > is
> > > > very close to that chosen through 32 steps, we can reduce the
> > > > tuning
> > > > step from 64 to 32. This can save time sending the tuning block
> > > > commands.
> > > >
> > > > Thus using 32 steps tuning can save kernel boot up time.
> > > >
> > > > Another case where time can be saved is when accessing the RPMB
> > > > partition of eMMC. Each time switch to RPMB partition, there is
> > > > a
> > > > retune action, causing a certain drop in performance. If we are
> > > > certain
> > > > that the results of 32 steps tuning are usable and we use it,
> > > > this
> > > > can
> > > > in a sense also guarantee performance when accessing the RPMB
> > > > partition.
> > > >
> > >
> > > Thanks for this explanation! Though, I have some more
> > > questions...
> > >
> > > ...regarding boot up time, how much time are we talking about?
> >
> > Luckily, I have a platform at hand that can be used for
> > experiments/
> > Below are the results from testing on this platform:
> >
> > [ 2.431993][T1200180] kworker/2:21: mtk-msdc bootdevice:
> > [name:mtk_sd&]Start tuning
> > [ 2.434950][T1200180] kworker/2:21: mtk-msdc bootdevice:
> > [name:mtk_sd&]Tuning finished
> > [ 2.435957][T1200180] kworker/2:21: mtk-msdc bootdevice:
> > [name:mtk_sd&]phase: [map:00000000ffffffc0] [maxlen:26] [final:19]
> > [ 2.462375][T1200180] kworker/2:21: [name:mmc_core&]mmc0: new
> > HS400
> > MMC card at address 0001
> > ...
> > [ 2.519863][T1300069] kworker/3:1: mtk-msdc 11250000.mmc:
> > [name:mtk_sd&]Start tuning
> > [ 2.526271][T1300069] kworker/3:1: mtk-msdc 11250000.mmc:
> > [name:mtk_sd&]Tuning finished
> > [ 2.527288][T1300069] kworker/3:1: mtk-msdc 11250000.mmc:
> > [name:mtk_sd&]phase: [map:ffffffffffff003f] [maxlen:48] [final:40]
> > [ 2.532269][T1300069] kworker/3:1: [name:mmc_core&]mmc2: new
> > ultra
> > high speed SDR104 SDIO card at address 0001
> >
> > As the kernel log indicates, it took 3 ms for eMMC to tune 32
> > steps,
> > while it took about 7 ms for SDIO to tune 64 steps. I have to
> > admit,
> > when it comes to saving boot up time, the benefits of reducing step
> > form 64 to 32 are quite minimal. Just as you said, especially when
> > async probe is enabled.
> >
>
> That's great to know, and it's *truly* nice information that you can
> put into the commit message, as this completes the analysis of this
> commit.
>
> Can you mention that in the commit message for v4 of this commit?
>
> "As per measurements taken on MT(xxxx), the tuning phase will take:
> eMMC - 32 steps: ~3ms
> 64 steps: xxxx
> SDIO - 32 steps: xxxx
> 64 steps: ~7ms
>
Okay, will update the commit message in v4.
> ...but while this won't prolong boot times by any meaningful amount
> of time, for eMMC, it should still be preferred to use 32 steps
> tuning
> because otherwise we lose performance for RPMB I//O, which requires
> re-tuning for each access.
> "
OK, I will make SD/SDIO default to use 64 steps tuning .
> > >
> > > I'm asking because while now I see - and agree - on using 32-
> > > steps
> > > tuning
> > > on eMMC to guarantee performance during RPMB access, as far as I
> > > know,
> > > there is no RPMB partition on SD/MicroSD cards (and, of course,
> > > SDIO
> > > devices).
> > >
> > > If the boot performance impact isn't big, as in, up to ~100
> > > milliseconds is
> > > not big at all (especially with async probe!), we can definitely
> > > avoid the
> > > addition of a devicetree property for 32-steps tuning, hence use
> > > a
> > > dynamic
> > > selection strategy such that:
> > > - On eMMC devices, always perform 32-steps tuning (hence no
> > > boot
> > > delay)
> > > - On SD cards and SDIO, always perform 64-steps tuning
> >
> > eMMC could also potentially have CRC issue if only tune 32-steps,
> > albeit with a lower likelihood. The precondition for using 32-steps
> > tuning is that it could provide roughly the same valid results as
> > using
> > 64-steps tuning. So taking everything into account, controlling the
> > tuning step as needed through the use of dts property seems to be a
> > more flixible approach.
> >
>
> Yes but since the only performance concern is about eMMC RPMB access,
> we could at least make this 64 steps as *default* for SD/SDIO, and
> 32 steps as default for eMMC.
> Device tree would be an override of those default values.
>
> Can we set 64 as default for SD/SDIO, 32 as default for eMMC, and
> then use
> the device tree to override those defaults?
>
Will update in v4.
> Angelo
More information about the linux-arm-kernel
mailing list