[PATCH 00/17] Add memberof(), split some headers, and slightly simplify code

Arnd Bergmann arnd at arndb.de
Fri Nov 19 07:57:46 PST 2021


On Fri, Nov 19, 2021 at 4:06 PM Alejandro Colomar (man-pages)
<alx.manpages at gmail.com> wrote:
> On 11/19/21 15:47, Arnd Bergmann wrote:
> > On Fri, Nov 19, 2021 at 12:36 PM Alejandro Colomar
>
> Yes, I would like to untangle the dependencies.
>
> The main reason I started doing this splitting
> is because I wouldn't be able to include
> <linux/stddef.h> in some headers,
> because it pulled too much stuff that broke unrelated things.
>
> So that's why I started from there.
>
> I for example would like to get NULL in memberof()
> without puling anything else,
> so <linux/NULL.h> makes sense for that.
>
> It's clear that every .c wants NULL,
> but it's not so clear that every .c wants
> everything that <linux/stddef.h> pulls indirectly.

>From what I can tell, linux/stddef.h is tiny, I don't think it's really
worth optimizing this part. I have spent some time last year
trying to untangle some of the more interesting headers, but ended
up not completing this as there are some really hard problems
once you start getting to the interesting bits.

The approach I tried was roughly:

- For each header in the kernel, create a preprocessed version
  that includes all the indirect includes, from that start a set
  of lookup tables that record which header is eventually included
  by which ones, and the size of each preprocessed header in
  bytes

- For a given kernel configuration (e.g. defconfig or allmodconfig)
  that I'm most interested in, look at which files are built, and what
  the direct includes are in the source files.

- Sort the headers by the product of the number of direct includes
  and the preprocessed size: the largest ones are those that are
  worth looking at first.

- use graphviz to visualize the directed graph showing the includes
  between the top 100 headers in that list. You get something like
  I had in [1], or the version afterwards at [2].

- split out unneeded indirect includes from the headers in the center
  of that graph, typically by splitting out struct definitions.

- repeat.

The main problem with this approach is that as soon as you start
actually reducing the unneeded indirect includes, you end up with
countless .c files that no longer build because they are missing a
direct include for something that was always included somewhere
deep underneath, so I needed a second set of scripts to add
direct includes to every .c file.

On the plus side, I did see something on the order of a 30%
compile speed improvement with clang, which is insane
given that this only removed dead definitions.

> But I'll note that linux/fs.h, linux/sched.h, linux/mm.h are
> interesting headers for further splitting.
>
>
> BTW, I also have a longstanding doubt about
> how header files are organized in the kernel,
> and which headers can and cannot be included
> from which other files.
>
> For example I see that files in samples or scripts or tools,
> that redefine many things such as offsetof() or ARRAY_SIZE(),
> and I don't know if there's a good reason for that,
> or if I should simply remove all that stuff and
> include <linux/offsetof.h> everywhere I see offsetof() being used.

The main issue here is that user space code should not
include anything outside of include/uapi/ and arch/*/include/uapi/

offsetof() is defined in include/linux/stddef.h, so this is by
definition not accessible here. It appears that there is also
an include/uapi/linux/stddef.h that is really strange because
it includes linux/compiler_types.h, which in turn is outside
of uapi/. This should probably be fixed.

      Arnd

[1] https://drive.google.com/file/d/14IKifYDadg2W5fMsefxr4373jizo9bLl/view?usp=sharing
[2] https://drive.google.com/file/d/1pWQcv3_ZXGqZB8ogV-JOfoV-WJN2UNnd/view?usp=sharing



More information about the linux-arm-kernel mailing list