[RFC/RFT PATCH 2/5] memblock: introduce generic memblock_setup_resources()

Mike Rapoport rppt at kernel.org
Wed Jun 2 11:43:32 PDT 2021


On Wed, Jun 02, 2021 at 04:51:41PM +0100, Russell King (Oracle) wrote:
> On Wed, Jun 02, 2021 at 04:54:17PM +0300, Mike Rapoport wrote:
> > On Wed, Jun 02, 2021 at 11:15:21AM +0100, Russell King (Oracle) wrote:
> > > On Wed, Jun 02, 2021 at 11:33:10AM +0300, Mike Rapoport wrote:
> > > > On Tue, Jun 01, 2021 at 02:54:15PM +0100, Russell King (Oracle) wrote:
> > > > > If I look at one of my kernels:
> > > > > 
> > > > > c0008000 T _text
> > > > > c0b5b000 R __end_rodata
> > > > > ... exception and unwind tables live here ...
> > > > > c0c00000 T __init_begin
> > > > > c0e00000 D _sdata
> > > > > c0e68870 D _edata
> > > > > c0e68870 B __bss_start
> > > > > c0e995d4 B __bss_stop
> > > > > c0e995d4 B _end
> > > > > 
> > > > > So the original covers _text..__init_begin-1 which includes the
> > > > > exception and unwind tables. Your version above omits these, which
> > > > > leaves them exposed.
> > > > 
> > > > Right, this needs to be fixed. Is there any reason the exception and unwind
> > > > tables cannot be placed between _sdata and _edata? 
> > > > 
> > > > It seems to me that they were left outside for purely historical reasons.
> > > > Commit ee951c630c5c ("ARM: 7568/1: Sort exception table at compile time")
> > > > moved the exception tables out of .data section before _sdata existed.
> > > > Commit 14c4a533e099 ("ARM: 8583/1: mm: fix location of _etext") moved
> > > > _etext before the unwind tables and didn't bother to put them into data or
> > > > rodata areas.
> > > 
> > > You can not assume that all sections will be between these symbols. This
> > > isn't specific to 32-bit ARM. If you look at x86's vmlinux.lds.in, you
> > > will see that BUG_TABLE and ORC_UNWIND_TABLE are after _edata, along
> > > with many other undiscarded sections before __bss_start.
> > 
> > But if you look at x86's setup_arch() all these never make it to the
> > resource tree. So there are holes in /proc/iomem between the kernel
> > resources.
> 
> Also true. However, my point was to counter your claim that these
> sections should be part of the .text/.data/.rodata etc sections in the
> output vmlinux.
> 
> There is, however, a more important point. The __ex_table section
> must exist and be separate from the .text/.data/.rodata sections in
> the output ELF file, as sorttable (the exception table sorter) relies
> on this to be able to find the table and sort it.
> 
> So, it isn't entirely "for historical reasons" as you said two messages
> ago.

Back then when __ex_table was moved from .data section, _sdata and _edata
were part of the .data section. Today they are not. So something like the
patch below will ensure for instance that __ex_table would be a part of
"Kernel data" in /proc/iomem without moving it to the .data section:

diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index f7f4620d59c3..2991feceab31 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -72,13 +72,6 @@ SECTIONS
 
 	RO_DATA(PAGE_SIZE)
 
-	. = ALIGN(4);
-	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
-		__start___ex_table = .;
-		ARM_MMU_KEEP(*(__ex_table))
-		__stop___ex_table = .;
-	}
-
 #ifdef CONFIG_ARM_UNWIND
 	ARM_UNWIND_SECTIONS
 #endif
@@ -143,6 +136,14 @@ SECTIONS
 	__init_end = .;
 
 	_sdata = .;
+
+	. = ALIGN(4);
+	__ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) {
+		__start___ex_table = .;
+		ARM_MMU_KEEP(*(__ex_table))
+		__stop___ex_table = .;
+	}
+
 	RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_SIZE)
 	_edata = .;
 
 
> Now, bear in mind that /proc/iomem is a user API, one which userspace
> depends on. If we start going around making /proc/iomem report stuff
> like kernel boot time reservations as "reserved" memory, we will end up
> breaking the kexec tooling on some platforms. For example, kexec
> tooling for 32-bit ARM parses /proc/iomem, looking for "System RAM",
> "System RAM (boot alias)" and "reserved" regions.
>
> So, I think changes to make this "more consistent" come with high
> risk.

I agree there is a risk but I don't think it's high. It does not look like
the minor changes in "reserved" reporting in /proc/iomem will break kexec
tooling. Anyway the amount of reserved and free memory depends on a
particular system, kernel version, configuration and command line.
I have no intention to report kernel boot time reservations
to /proc/iomem on architectures that do not report them there today,
although this also does not seem like a significant factor.

On the other hand, making /proc/iomem reporting consistent among
architectures will allow to reduce complexity of both the kernel and kexec
tools in the long run.

-- 
Sincerely yours,
Mike.



More information about the linux-arm-kernel mailing list