[PATCHv4 2/2] arm64/io: Add a header for mmio access instrumentation

Sai Prakash Ranjan quic_saipraka at quicinc.com
Mon Nov 29 05:49:45 PST 2021


Hi Arnd,

On 11/22/2021 9:13 PM, Sai Prakash Ranjan wrote:
> On 11/22/2021 9:05 PM, Arnd Bergmann wrote:
>> On Mon, Nov 22, 2021 at 3:59 PM Sai Prakash Ranjan
>> <quic_saipraka at quicinc.com> wrote:
>>>>> And if we do move this instrumentation to asm-generic/io.h, how will
>>>>> that be executed since
>>>>> the arch specifc read{b,w,l,q} overrides this generic version?
>>>> As I understand it, your version also requires architecture specific
>>>> changes, so that would be the same: it only works for architectures
>>>> that get the definition of readl()/readl_relaxed()/inl()/... from
>>>> include/asm-generic/io.h and only override the __raw version. Arnd
>>> Sorry, I didn't get this part, so  I am trying this on ARM64:
>>>
>>> arm64/include/asm/io.h has read{b,l,w,q} defined.
>>> include/asm-generic/io.h has below:
>>>     #ifndef readl
>>>     #define readl readl
>>>     static inline u32 readl(const volatile void __iomem *addr)
>>>
>>> and we include asm-generic/io.h in arm64/include/asm/io.h at the end
>>> after the definitions for arm64 mmio accesors.
>>> So arch implementation here overrides generic ones as I see it, am I
>>> missing something? I even confirmed this
>>> with some trace_printk to generic and arch specific definitions of 
>>> readl
>>> and I see arch specific ones being called.
>> Ah, you are right that the arm64 version currently has custom 
>> definitions
>> of the high-level interfaces. These predate the introduction of the
>> __io_{p,}{b,a}{r,w} macros and are currently only used on risc-v.
>>
>> I think in this case you should start by changing arm64 to use the
>> generic readl() etc definitions, by removing the extra definitions and
>> using
>>
>> #define __io_ar(v) __iormb(__v)
>> #define __io_bw() dma_wmb()
>>
>>
>
> Sure, will do that.

I got the callback version implemented as suggested by you to compare 
the overall size and performance
with the inline version and apparently the size increased more in case 
of callback version when compared to
inline version. As for the performance, I ran some basic dd tests and 
sysbench and didn't see any noticeable
difference between the two implementations.

**Inline version with CONFIG_FTRACE=y and CONFIG_TRACE_MMIO_ACCESS=y**
$ size vmlinux
    text                   data                    bss dec             
hex         filename
23884219        14284468         532568 38701255        24e88c7 vmlinux

**Callback version with CONFIG_FTRACE=y and CONFIG_TRACE_MMIO_ACCESS=y**
$ size vmlinux
    text                  data                     bss dec               
        hex        filename
24108179        14279596         532568 38920343        251e097 vmlinux

$ ./scripts/bloat-o-meter inline-vmlinux callback-vmlinux
add/remove: 8/3 grow/shrink: 4889/89 up/down: 242244/-11564 (230680)
Total: Before=25812612, After=26043292, chg +0.89%

Note: I had arm64 move to asm-generic high level accessors in both 
versions which I plan to post together but
not included in below links,

For your reference, here are the 2 versions of the patches,
  Inline version - 
https://github.com/saiprakash-ranjan/RTB-Patches/blob/main/0001-asm-generic-io-Add-logging-support-for-MMIO-accessor.patch
  Callback version - 
https://github.com/saiprakash-ranjan/RTB-Patches/blob/main/0001-asm-generic-io-Add-callback-based-MMIO-logging-suppo.patch

Couple of things noted in callback version which didn't look quite good 
was that it needed some way to register the
callbacks and need to use some initcall (core_initcall used) as 
implemented in above patch but that would probably
mean we would lose some register logging(if there is some) in between 
early_initcall(which is when trace events get
enabled) and this trace_readwrite core_initcall. Another thing I noticed 
that since we now move to callback based
implementations, the caller info is somewhat inaccurate when compared to 
inline version.

Also regarding your earlier comment on inline versions being possibly 
problematic on lower memory systems, enabling
Ftrace itself is making a considerable size difference and in such 
systems ftrace wouldn't be enabled which implicitly means
MMIO logging which are based on trace events will be disabled anyways.

Here is the size delta with FTRACE enabled and disabled with arm64 
defconfig without MMIO logging support:
$ ./scripts/bloat-o-meter ftrace-disabled-vmlinux ftrace-enabled-vmlinux
add/remove: 8/3 grow/shrink: 4889/89 up/down: 242244/-11564 (230680)
Total: Before=22865837, After=25215198, chg +10.27%

Given all this, I would prefer inline versions in asm-generic high level 
accessors, let me know what you think.

Thanks,
Sai



More information about the linux-arm-kernel mailing list