[PATCH 3/3] libeth: xdp: Disable generic kCFI pass for libeth_xdp_tx_xmit_bulk()

Tue Oct 28 15:01:05 PDT 2025

On Tue, Oct 28, 2025 at 05:29:30PM +0100, Alexander Lobakin wrote:
> From: Nathan Chancellor <nathan at kernel.org>
> Date: Mon, 27 Oct 2025 13:54:09 -0700
> 
> > On Mon, Oct 27, 2025 at 03:59:51PM +0100, Alexander Lobakin wrote:
> >> Hmmm,
> >>
> >> For this patch:
> >>
> >> Acked-by: Alexander Lobakin <aleksander.lobakin at intel.com>
> > 
> > Thanks a lot for taking a look, even if it seems like we might not
> > actually go the route of working around this.
> > 
> >> However,
> >>
> >> The XSk metadata infra in the kernel relies on that when we call
> >> xsk_tx_metadata_request(), we pass a static const struct with our
> >> callbacks and then the compiler makes all these calls direct.
> >> This is not limited to libeth (although I realize that it triggered
> >> this build failure due to the way how I pass these callbacks), every
> >> driver which implements XSk Tx metadata and calls
> >> xsk_tx_metadata_request() relies on that these calls will be direct,
> >> otherwise there'll be such performance penalty that is unacceptable
> >> for XSk speeds.
> > 
> > Hmmmm, I am not really sure how you could guarantee that these calls are
> > turned direct from indirect aside from placing compile time assertions
> > around like this... when you say "there'll be such performance penalty
> 
> You mean in case of CFI or in general? Because currently on both GCC and
> Clang with both OPTIMIZE_FOR_{SIZE,SPEED} they get inlined in every driver.

I mean in general but obviously that sort of optimization is high value
for the compiler to perform so I would only expect it not to occur in
extreme cases like sanitizers being enabled; I would expect no issues
when using a backend CFI implementation

> > that is unacceptable for XSk speeds", does that mean that everything
> > will function correctly but slower than expected or does the lack of
> > proper speed result in functionality degredation?
> 
> Nothing would break, just work way slower than expected.
> xsk_tx_metadata_request() is called for each Tx packet (when Tx metadata
> is enabled). Average XSK Tx perf is ~35-40 Mpps (millions of packets per
> second), often [much] higher. Having an indirect call there would divide
> it by n.

Ah okay.

> >> Maybe xsk_tx_metadata_request() should be __nocfi as well? Or all
> >> the callers of it?
> > 
> > I would only expect __nocfi_generic to be useful for avoiding a problem
> > such as this. __nocfi would be too big of a hammer because it would
> 
> Yep, sorry, I actually meant __nocfi_generic...

I figured, just wanted to make sure! This series needs to go to mainline
sooner rather than later, so maybe xsk_tx_metadata_request() could pick
up __nocfi_generic as a future change to net-next since there is no
obvious breakage? 32-bit ARM is the only architecture affected by this
change since all other architectures that support kCFI have a backend
specific lowering and I am guessing very few people would actually
notice this problem in practice.

Thanks again for chiming in and taking a look!

Cheers,
Nathan