[PATCH] firmware: fw_base: Improve exception stack setup in trap handler
Anup Patel
anup at brainfault.org
Mon Aug 10 23:10:50 EDT 2020
On Mon, Aug 10, 2020 at 7:35 PM Jessica Clarke <jrtc27 at jrtc27.com> wrote:
>
> On 10 Aug 2020, at 14:52, Anup Patel <Anup.Patel at wdc.com> wrote:
> >
> > Currently, the low-level trap handler (i.e. _trap_handler()) uses
> > branch instructions to conditionally setup exception stack based
> > on which mode trap occured.
> >
> > This patch implements exception stack setup using xor instructions
> > which is 2 instructions less compared to previous approach and faster
> > due to lack of branch instructions.
> >
> > The new exception stack setup approach can be best described by the
> > following pseudocode:
> >
> > Came_From_M_Mode = (MSTATUS.MPP & 0x2) >> 1
>
> This relies on PRIV 0b10 currently being unallocated, which doesn't
> make me feel warm and fuzzy inside.
The PRIV 0b10 mode is not used by any of the upcoming extensions. In future,
we will require really strong justification in future to use the
unused 0b10 mode.
Even H-extension has cleanly avoided using PRIV 0b10 mode.
In any case, we should document this assumption.
>
> > Exception_Stack = TP ^ (Came_From_M_Mode * (SP ^ TP))
>
> This relies on MUL with 0/1 being fast, but that may well be a different
The MUL can be replaced with AND instruction but it was taking one
additional instruction. See below.
/*
* Set T0 to appropriate exception stack
*
* Came_From_M_Mode = ((MSTATUS.MPP & 0x2) >> 1) - 1
* Exception_Stack = SP ^ (Came_From_M_Mode & (TP ^ SP))
*
* Came_From_M_Mode = -1UL ==> Exception_Stack = TP
* Came_From_M_Mode = 0UL ==> Exception_Stack = SP
*/
csrr t0, CSR_MSTATUS
srl t0, t0, (MSTATUS_MPP_SHIFT + 1)
and t0, t0, 0x1
add t0, t0, -1
xor tp, tp, sp
and t0, t0, tp
xor tp, tp, sp
xor t0, sp, t0
> pipeline in your processor (in an OoO core) with fewer available, or a
> higher-latency path to an M-box in a simpler in-order pipeline. The MUL
> is also not compressible. You probably do want a branch, but in a way
> that macro-op fusion will hopefully turn it into a conditional move:
>
> mv Exception_Stack, tp
> beqz Came_From_M_Mode, 1f
> mv Exception_Stack, sp
> 1:
Actually, I had started off with branches but code without branches (this
patch) is performing better on QEMU (because fewer TCG blocks required
for translating _trap_handler()). This code (this patch) will be much faster
on real HW because branch misses can cause bubbles in the instruction
pipeline.
The code with branches is as follows:
csrr t0, CSR_MSTATUS
srl t0, t0, (MSTATUS_MPP_SHIFT + 1)
and t0, t0, 0x1
beq t0, zero, _trap_stack_s_mode
_trap_stack_m_mode:
add t0, sp, zero
j _trap_stack_done
_trap_stack_s_mode:
add t0, tp, zero
_trap_stack_done:
>
> Same number of instructions, but all compressible, no multiplications
> required and the lone branch can be macro-op fused.
Overall, the key thing is having no (or fewer) branches in _trap_handler()
with the same number of (or fewer) instructions.
Regards,
Anup
>
> Jess
>
> > Came_From_M_Mode = 0 ==> Exception_Stack = TP
> > Came_From_M_Mode = 1 ==> Exception_Stack = SP
> >
> > Signed-off-by: Anup Patel <anup.patel at wdc.com>
> > ---
> > firmware/fw_base.S | 49 ++++++++++++++++++++--------------------------
> > 1 file changed, 21 insertions(+), 28 deletions(-)
> >
> > diff --git a/firmware/fw_base.S b/firmware/fw_base.S
> > index b66ac41..53a9a48 100644
> > --- a/firmware/fw_base.S
> > +++ b/firmware/fw_base.S
> > @@ -490,35 +490,28 @@ _trap_handler:
> > /* Save T0 in scratch space */
> > REG_S t0, SBI_SCRATCH_TMP0_OFFSET(tp)
> >
> > - /* Check which mode we came from */
> > + /*
> > + * Set T0 to appropriate exception stack
> > + *
> > + * Came_From_M_Mode = (MSTATUS.MPP & 0x2) >> 1
> > + * Exception_Stack = TP ^ (Came_From_M_Mode * (SP ^ TP))
> > + *
> > + * Came_From_M_Mode = 0 ==> Exception_Stack = TP
> > + * Came_From_M_Mode = 1 ==> Exception_Stack = SP
> > + */
> > csrr t0, CSR_MSTATUS
> > - srl t0, t0, MSTATUS_MPP_SHIFT
> > - and t0, t0, PRV_M
> > - xori t0, t0, PRV_M
> > - beq t0, zero, _trap_handler_m_mode
> > -
> > - /* We came from S-mode or U-mode */
> > -_trap_handler_s_mode:
> > - /* Set T0 to original SP */
> > - add t0, sp, zero
> > -
> > - /* Setup exception stack */
> > - add sp, tp, -(SBI_TRAP_REGS_SIZE)
> > -
> > - /* Jump to code common for all modes */
> > - j _trap_handler_all_mode
> > -
> > - /* We came from M-mode */
> > -_trap_handler_m_mode:
> > - /* Set T0 to original SP */
> > - add t0, sp, zero
> > -
> > - /* Re-use current SP as exception stack */
> > - add sp, sp, -(SBI_TRAP_REGS_SIZE)
> > -
> > -_trap_handler_all_mode:
> > - /* Save original SP (from T0) on stack */
> > - REG_S t0, SBI_TRAP_REGS_OFFSET(sp)(sp)
> > + srl t0, t0, (MSTATUS_MPP_SHIFT + 1)
> > + and t0, t0, 0x1
> > + xor sp, sp, tp
> > + mul t0, t0, sp
> > + xor sp, sp, tp
> > + xor t0, tp, t0
> > +
> > + /* Save original SP on exception stack */
> > + REG_S sp, (SBI_TRAP_REGS_OFFSET(sp) - SBI_TRAP_REGS_SIZE)(t0)
> > +
> > + /* Set SP to exception stack and make room for trap registers */
> > + add sp, t0, -(SBI_TRAP_REGS_SIZE)
> >
> > /* Restore T0 from scratch space */
> > REG_L t0, SBI_SCRATCH_TMP0_OFFSET(tp)
> > --
> > 2.25.1
> >
> >
> > --
> > opensbi mailing list
> > opensbi at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/opensbi
More information about the opensbi
mailing list