[PATCH v4 2/2] riscv: Set unalignment speed at compile time

Evan Green evan at rivosinc.com
Tue Feb 27 11:44:30 PST 2024


On Tue, Feb 27, 2024 at 11:20 AM Charlie Jenkins <charlie at rivosinc.com> wrote:
>
> On Tue, Feb 27, 2024 at 06:48:54PM +0000, Conor Dooley wrote:
> > On Tue, Feb 27, 2024 at 10:17:21AM -0800, Charlie Jenkins wrote:
> > > On Tue, Feb 27, 2024 at 11:39:25AM +0000, Conor Dooley wrote:
> > > > On Fri, Feb 16, 2024 at 12:33:19PM -0800, Charlie Jenkins wrote:
> >
> > > > > +config RISCV_EMULATED_UNALIGNED_ACCESS
> > > > > +       bool "Assume the CPU expects emulated unaligned memory accesses"
> > > > > +       depends on NONPORTABLE
> > > >
> > > > This is portable too, right?
> > >
> > > I guess so? I think I would prefer to have the probing being the only
> > > portable option.
> >
> > I dunno, I think there could be value to someone in always emulating
> > this in the kernel and I don't think that should relegate them to the
> > naughty step, given it can work everywhere.
>
> Alright, I will remove the nonportable.
>
> >
> >
> > > > > +config RISCV_SLOW_UNALIGNED_ACCESS
> > > > > +       bool "Assume the CPU supports slow unaligned memory accesses"
> > > > > +       depends on NONPORTABLE
> > > > > +       help
> > > > > +         Assume that the CPU supports slow unaligned memory accesses. When
> > > > > +         enabled, this option improves the performance of the kernel on such
> > > > > +         CPUs.
> > > >
> > > > Does it? Are you sure that generating unaligned accesses on systems
> > > > where they are slow is a performance increase?
> > > > That said, I don't really see this option actually doing anything other
> > > > than setting the value for hwprobe, so I don't actually know what the
> > > > effect of this option actually is on the kernel's performance.
> > > >
> > > > Generally I would like to suggest a change from "CPU" to "system" here,
> > > > since the slow cases that exist are mostly because the unaligned access
> > > > is actually emulated in firmware.
> > >
> > > It would be ideal if "emulated" was used for any case of emulated
> > > accesses (firmware or in the kernel).  Doing emulated accesses will be
> > > orders of magnitude slower than a processor that "slowly" handles the
> > > accesses.
> > >
> > > So even if the processor performs a "slow" access, it could still be
> > > beneficial for the kernel to do the misaligned access rather than manual
> > > do the alignment.
> >
> > Right. But, at least from a probing perspective, SLOW is what gets
> > selected when firmware emulates the unaligned access so to userspace
> > seeing slow means that the performance could be horrifically bad:
> >
> > |     rzfive:
> > |         cpu0: Ratio of byte access time to unaligned word access is
> > | 1.05, unaligned accesses are fast
> > |
> > |     icicle:
> > |
> > |         cpu1: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> > |         cpu2: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> > |         cpu3: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> > |
> > |         cpu0: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> > |
> > |     k210:
> > |
> > |         cpu1: Ratio of byte access time to unaligned word access is
> > | 0.02, unaligned accesses are slow
> > |         cpu0: Ratio of byte access time to unaligned word access is
> > | 0.02, unaligned accesses are slow
> > |
> > |     starlight:
> > |
> > |         cpu1: Ratio of byte access time to unaligned word access is
> > | 0.01, unaligned accesses are slow
> > |         cpu0: Ratio of byte access time to unaligned word access is
> > | 0.02, unaligned accesses are slow
> > |
> > |     vexriscv/orangecrab:
> > |
> > |         cpu0: Ratio of byte access time to unaligned word access is
> > | 0.00, unaligned accesses are slow
> >  https://lore.kernel.org/all/CAMuHMdVtXGjP8VFMiv-7OMFz1XvfU1cz=Fw4jL3fcp4wO1etzQ@mail.gmail.com/
>
> If the accesses are horrifically slow then maybe they should be flagged
> as emulated rather than slow by the probe.

Yeah, I thought about that too. I didn't feel like I had enough info
to come up with the delineating number for "horrifically slow". Plus
Clement came in with a series to detect specifically that accesses are
emulated (though it will only work on future platforms that can
delegate the trap to the kernel).

-Evan



More information about the linux-riscv mailing list