[PATCH v1 0/4] [RFC] Implement Trampoline File Descriptor

Thu Jul 30 16:54:03 EDT 2020

On Thu, Jul 30, 2020 at 7:24 AM Madhavan T. Venkataraman
<madvenka at linux.microsoft.com> wrote:
>
> Sorry for the delay. I just wanted to think about this a little.
> In this email, I will respond to your first suggestion. I will
> respond to the rest in separate emails if that is alright with
> you.
>
> On 7/28/20 12:31 PM, Andy Lutomirski wrote:
>
> On Jul 28, 2020, at 6:11 AM, madvenka at linux.microsoft.com wrote:
>
> From: "Madhavan T. Venkataraman" <madvenka at linux.microsoft.com>
>
> The kernel creates the trampoline mapping without any permissions. When
> the trampoline is executed by user code, a page fault happens and the
> kernel gets control. The kernel recognizes that this is a trampoline
> invocation. It sets up the user registers based on the specified
> register context, and/or pushes values on the user stack based on the
> specified stack context, and sets the user PC to the requested target
> PC. When the kernel returns, execution continues at the target PC.
> So, the kernel does the work of the trampoline on behalf of the
> application.
>
> This is quite clever, but now I’m wondering just how much kernel help
> is really needed. In your series, the trampoline is an non-executable
> page.  I can think of at least two alternative approaches, and I'd
> like to know the pros and cons.
>
> 1. Entirely userspace: a return trampoline would be something like:
>
> 1:
> pushq %rax
> pushq %rbc
> pushq %rcx
> ...
> pushq %r15
> movq %rsp, %rdi # pointer to saved regs
> leaq 1b(%rip), %rsi # pointer to the trampoline itself
> callq trampoline_handler # see below
>
> You would fill a page with a bunch of these, possibly compacted to get
> more per page, and then you would remap as many copies as needed.  The
> 'callq trampoline_handler' part would need to be a bit clever to make
> it continue to work despite this remapping.  This will be *much*
> faster than trampfd. How much of your use case would it cover?  For
> the inverse, it's not too hard to write a bit of asm to set all
> registers and jump somewhere.
>
> Let me state what I have understood about this suggestion. Correct me if
> I get anything wrong. If you don't mind, I will also take the liberty
> of generalizing and paraphrasing your suggestion.
>
> The goal is to create two page mappings that are adjacent to each other:
>
> - a code page that contains template code for a trampoline. Since the
>  template code would tend to be small in size, pack as many of them
>  as possible within a page to conserve memory. In other words, create
>  an array of the template code fragments. Each element in the array
>  would be used for one trampoline instance.
>
> - a data page that contains an array of data elements. Corresponding
>  to each code element in the code page, there would be a data element
>  in the data page that would contain data that is specific to a
>  trampoline instance.
>
> - Code will access data using PC-relative addressing.
>
> The management of the code pages and allocation for each trampoline
> instance would all be done in user space.
>
> Is this the general idea?

Yes.

>
> Creating a code page
> --------------------
>
> We can do this in one of the following ways:
>
> - Allocate a writable page at run time, write the template code into
>   the page and have execute permissions on the page.
>
> - Allocate a writable page at run time, write the template code into
>   the page and remap the page with just execute permissions.
>
> - Allocate a writable page at run time, write the template code into
>   the page, write the page into a temporary file and map the file with
>   execute permissions.
>
> - Include the template code in a code page at build time itself and
>   just remap the code page each time you need a code page.

This latter part shouldn't need any special permissions as far as I know.

>
> Pros and Cons
> -------------
>
> As long as the OS provides the functionality to do this and the security
> subsystem in the OS allows the actions, this is totally feasible. If not,
> we need something like trampfd.
>
> As Floren mentioned, libffi does implement something like this for MACH.
>
> In fact, in my libffi changes, I use trampfd only after all the other methods
> have failed because of security settings.
>
> But the above approach only solves the problem for this simple type of
> trampoline. It does not provide a framework for addressing more complex types
> or even other forms of dynamic code.
>
> Also, each application would need to implement this solution for itself
> as opposed to relying on one implementation provided by the kernel.

I would argue this is a benefit.  If the whole implementation is in
userspace, there is no ABI compatibility issue.  The user program
contains the trampoline code and the code that uses it.

>
> Trampfd-based solution
> ----------------------
>
> I outlined an enhancement to trampfd in a response to David Laight. In this
> enhancement, the kernel is the one that would set up the code page.
>
> The kernel would call an arch-specific support function to generate the
> code required to load registers, push values on the stack and jump to a PC
> for a trampoline instance based on its current context. The trampoline
> instance data could be baked into the code.
>
> My initial idea was to only have one trampoline instance per page. But I
> think I can implement multiple instances per page. I just have to manage
> the trampfd file private data and VMA private data accordingly to map an
> element in a code page to its trampoline object.
>
> The two approaches are similar except for the detail about who sets up
> and manages the trampoline pages. In both approaches, the performance problem
> is addressed. But trampfd can be used even when security settings are
> restrictive.
>
> Is my solution acceptable?

Perhaps.  In general, before adding a new ABI to the kernel, it's nice
to understand how it's better than doing the same thing in userspace.
Saying that it's easier for user code to work with if it's in the
kernel isn't necessarily an adequate justification.

Why would remapping two pages of actual application text ever fail?