[PATCH] arm64/alternatives: use subsections for replacement sequences

Dave Martin Dave.Martin at arm.com
Tue Jul 7 06:35:53 EDT 2020


On Mon, Jul 06, 2020 at 07:04:44PM +0300, Ard Biesheuvel wrote:
> On Mon, 6 Jul 2020 at 18:50, Dave Martin <Dave.Martin at arm.com> wrote:
> >
> > On Wed, Jul 01, 2020 at 07:32:07PM +0200, Ard Biesheuvel wrote:
> > > On Wed, 1 Jul 2020 at 19:30, Ard Biesheuvel <ardb at kernel.org> wrote:
> > > >
> > > > On Wed, 1 Jul 2020 at 19:01, Dave P Martin <dave.martin at arm.com> wrote:
> > > > >
> > > > > On Tue, Jun 30, 2020 at 10:19:21AM +0200, Ard Biesheuvel wrote:
> > > > > > When building very large kernels, the logic that emits replacement
> > > > > > sequences for alternatives fails when relative branches are present
> > > > > > in the code that is emitted into the .altinstr_replacement section
> > > > > > and patched in at the original site and fixed up. The reason is that
> > > > > > the linker will insert veneers if relative branches go out of range,
> > > > > > and due to the relative distance of the .altinstr_replacement from
> > > > > > the .text section where its branch targets usually live, veneers
> > > > > > may be emitted at the end of the .altinstr_replacement section, with
> > > > > > the relative branches in the sequence pointed at the veneers instead
> > > > > > of the actual target.
> > > > > >
> > > > > > The alternatives patching logic will attempt to fix up the branch to
> > > > > > point to its original target, which will be the veneer in this case,
> > > > > > but given that the patch site is likely to be far away as well, it
> > > > > > will be out of range and so patching will fail. There are other cases
> > > > > > where these veneers are problematic, e.g., when the target of the
> > > > > > branch is in .text while the patch site is in .init.text, in which
> > > > > > case putting the replacement sequence inside .text may not help either.
> > > > > >
> > > > > > So let's use subsections to emit the replacement code as closely as
> > > > > > possible to the patch site, to ensure that veneers are only likely to
> > > > > > be emitted if they are required at the patch site as well, in which
> > > > > > case they will be in range for the replacement sequence both before
> > > > > > and after it is transported to the patch site.
> > > > > >
> > > > > > This will prevent alternative sequences in non-init code from being
> > > > > > released from memory after boot, but this is tolerable given that the
> > > > > > entire section is only 512 KB on an allyesconfig build (which weighs in
> > > > > > at 500+ MB for the entire Image). Also, note that modules today carry
> > > > > > the replacement sequences in non-init sections as well, and any of
> > > > > > those that target init code will be emitted into init sections after
> > > > > > this change.
> > > > > >
> > > > > > This fixes an early crash when booting an allyesconfig kernel on a
> > > > > > system where any of the alternatives sequences containing relative
> > > > > > branches are activated at boot (e.g., ARM64_HAS_PAN on TX2)
> > > > > >
> > > > > > Cc: Suzuki K Poulose <suzuki.poulose at arm.com>
> > > > > > Cc: James Morse <james.morse at arm.com>
> > > > > > Cc: Andre Przywara <andre.przywara at arm.com>
> > > > > > Cc: Dave P Martin <dave.martin at arm.com>
> > > > > > Signed-off-by: Ard Biesheuvel <ardb at kernel.org>
> > > > > > ---
> > > > > >  arch/arm64/include/asm/alternative.h | 16 ++++++++--------
> > > > > >  arch/arm64/kernel/vmlinux.lds.S      |  3 ---
> > > > > >  2 files changed, 8 insertions(+), 11 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
> > > > > > index 5e5dc05d63a0..12f0eb56a1cc 100644
> > > > > > --- a/arch/arm64/include/asm/alternative.h
> > > > > > +++ b/arch/arm64/include/asm/alternative.h
> > > > > > @@ -73,11 +73,11 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
> > > > > >       ".pushsection .altinstructions,\"a\"\n"                         \
> > > > > >       ALTINSTR_ENTRY(feature)                                         \
> > > > > >       ".popsection\n"                                                 \
> > > > > > -     ".pushsection .altinstr_replacement, \"a\"\n"                   \
> > > > > > +     ".subsection 1\n"                                               \
> > > > >
> > > > > This uses subsections in existing sections.  Could that interfere with
> > > > > existing (or future) uses of subsections?  (I've not checked whether
> > > > > there actually are such uses.  I'm also assuming that clobbering the
> > > > > invoker's idea of what section is .previous doesn't matter.)
> > > > >
> > > >
> > > > It shouldn't matter, really. You can use different indexes for the
> > > > subsection, but since the execution never flows from one subsection
> > > > into the next, all that matters is that they are 'somewhere else'
> > > >
> > > > As for the use of .previous - the idea is that this does not affect
> > > > the contents of the section stack, which I think makes sense. We could
> > > > use '.pushsection .text, 1' as well to enter another subsection in
> > > > .text, but that means we keep the .text vs .init.text issue that this
> > > > patch solves.
> >
> > The following works:
> >
> >         .pushsection junk
> >         .previous
> >         .subsection foo
> >                 // ...
> >
> >         .popsection
> >
> > though the gas documentation is not very clear about the relationship
> > between .previous and the section stack directives.  In fact, each stack
> > slot has its own notion of .previous.  If this trick is too uncertain,
> > we can probably do without it though.  Relying on .previous after
> > invoking a macro is ill-advised anyway, and I haven't seen this issue
> > come up in practice.
> >
> 
> There is some mention about .previous only affecting the slot at the
> top of the stack, but it is rather vague.
> 
> > Since foo is just a number, maybe we could just pick a randomish value,
> > similarly to the way we "manage" asm local labels generated by macros,
> > leaving small-integer values reserved for top-level code?  This might
> > help prevent surprises later on.
> >
> > In general it's reasonable to use subsections to consolidate things
> > that should be kept close but otherwise contiguous in the output
> > object, such as collecting together entries for a jump table.  If a .S
> > file and macros it calls are both using subsection 1, the macros might
> > spit out garbage into the middle of the table the .S file is trying to
> > build for example.  However, I don't see any obvious evidence of that
> > kind of thing in arch/arm64, and nothing in core code (no surprise
> > there, this is asm).
> >
> 
> All uses of subsections I am aware just use it to emit code out of
> line, with a label at the start and a branch at the end, and the only
> reason is to remove it from the hot path. For that kind of use, the
> only requirement for the subsection index is that it is != 0.
> 
> If you are doing smart things with subsections and expect some
> consistency in how they are organized at the end of the object file,
> you might care about the subsection index, but I don't see a reason to
> do so here.

If we used .subsection 77, say, then a conflict would be highly
unlikely.  But I'd agree that this might just cause more confusion in
the long run (if it ever matters).

> > > >
> > > > > Another wrinkle: the replacement code now becomes executable, whereas
> > > > > I think it was previously in rodata.  I'm not sure how much this
> > > > > matters, but it might be a source of gadgets.
> > > > >
> > > >
> > > > True. Perhaps we need to get rid of relative branches in alternative
> > > > sequences entirely - see below.
> > > >
> > > > >
> > > > > A different option would be to add an explicitly veneered branch macro
> > > > > for use in alternatives, maybe adrp+add+br.  For BTI compatility, we'd
> > > > > need a bti j or equivalent at the destination, which might or might not
> > > > > be easy to achieve -- mind you, I think we theoretically need that
> > > > > anyway for veneers to work properly in all cases.
> > > > >
> > > > > Because we would define the exact instruction sequence, the
> > > > > alternatives code could probably replace it with a direct branch if the
> > > > > actual destination is close enough.  The downside is that it wouldn't
> > > > > be a single instruction any more, and there would be some overhead for
> > > > > conditional branches if we replace the unneeded insns with NOPs.
> > > > >
> > > >
> > > > Yeah, this becomes quite hairy very quickly, especially because now
> > > > you need to allocate a spare register each time you do this.
> >
> > which is not ideal, I agree.
> >
> > Do we anticipate any truly out-of-range branches, or are we assuming the
> > kernel text + modules area is always compact enough that direct
> > branching always works?
> >
> 
> There are two realistic failure modes, afaict:
> - *Really* large kernels, such as allyesconfig, which is kind of
> artificial but still relevant for validation purposes
> - Occurrences where the module region is almost exhausted, to the
> point where the next module's non-init segment no longer fits, but the
> init section does. In this case, any alternative sequences with
> branches that need to be patched into the init text section may be out
> of range for their targets (which could be actual functions or PLTs
> that were emitted into the non-init section)
> 
> 
> > > >
> > > > One other option is to simply disallow branches in the alternative
> > > > sequences: I spotted three occurrences [0] that are quite easily
> > > > fixed, by inverting the condition so that the relative branch is
> > > > emitted in place, and the alternative sequence is just NOPs.
> > >
> > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=arm64-alt-branches
> >
> > That might be a nice simplification if there's no significant
> > performance impact.
> >
> 
> Actually, the PAN code could be improved a bit more - I just sent a
> patch - but in general, disallowing such branches would be a nice
> simplification but I am not sure we can easily enforce it at build
> time.

We could scan for problem relocs when linking, but that would introduce
an extra build step.  So I'd guess that wouldn't be popular just for
this.

Cheers
---Dave



More information about the linux-arm-kernel mailing list