String literals in __init functions

Thu Mar 26 09:37:00 PDT 2015

On 26 March 2015 at 17:13, Joe Perches <joe at perches.com> wrote:
> On Thu, 2015-03-26 at 13:40 +0100, Mason wrote:
>> On 25/03/2015 19:01, Joe Perches wrote:
>> > On Wed, 2015-03-25 at 18:56 +0100, Mason wrote:
>> >
>> >> AFAIU, functions only used at system init are tagged __init to have
>> >> the linker store them in a separate .init.text section, so memory can
>> >> be reclaimed once initialization is complete. Is that correct?
>> >>
>> >> The corresponding tag for data is __initdata (section .init.data)
>> >>
>> >> I started wondering if the string literals used in an __init functions
>> >> were automatically marked __initdata.
>> >>
>> >> Looking at the objdump output, I see that the string literals are,
>> >> in fact, stored in the .rodata section. I suppose that .rodata is NOT
>> >> reclaimed after init?
>> >>
>> >> [...]
>> >>
>> >> Did I miss something in init.h?
>> >> Or should it be done like above to reclaim string literals?
>> >
>> > No, you didn't miss anything.
>> >
>> > One proposal:
>> >
>> > https://lkml.org/lkml/2014/8/21/255
>>
>> Thanks for the link!
>>
>> Here's the equivalent gmane link for my own reference:
>> http://thread.gmane.org/gmane.linux.kernel/1771969
>>
>> Basically, if I understand correctly, Ingo NAKed the patch, saying
>> this should be done automatically by the toolchain. That would make
>> for an interesting side-project...
>
> True.  It's probably not feasible though.
>
> Tracking string deduplication/reuse would be pretty difficult.

Yep, that's why I simply didn't attempt to write a "toolchain" to
automatically put strings into the appropriate section. String
annotation and deduplication is best done in the compiler. It already
does impressive tricks to limit the amount of actual strings ending up
in the binary. If one would try to write a compiler plugin to
automatically flag __init / __exit strings it would have to be an LTO
pass as only there one would have the complete view where the string
will end up. It's not as simple as blindly marking all strings used in
__init / __exit functions to end up in the corresponding .rodata
section because those strings may be passed to functions that want to
keep a pointer, e.g. as an object name. But not all functions do! So
only an LTO pass *may* see the whole usage of a possible __init /
__exit string. Therefore I'm still not convinced that solving the
problem in the toolchain is the right thing to do. It's *way* more
complicated and probably gets it wrong more often than not. Therefore
the straight simple approach of manually marking the strings is IMHO
the best solution. Unfortunately, not everyone shares this opinion :/

Mathias