Code generation involving __raw_readl and __raw_writel

Mason mpeg.blue at free.fr
Thu Nov 27 05:01:41 PST 2014


On 27/11/2014 12:23, Arnd Bergmann wrote:

> On Thursday 27 November 2014 11:40:34 Mason wrote:
>
>> Hello everyone,
>>
>> Consider the following code (preprocessor output):
>>
>> static int tangox_target(struct cpufreq_policy *policy, unsigned int index)
>> {
>>    while (__raw_readl((volatile void *)(0xf0000000 +(0x10024))) >> 31);
>>    __raw_writel(0, (volatile void *)(0xf0000000 +(0x10024)));
>>    return 0;
>> }
>
> First of all:

Disclaimer: I've only recently started writing driver code, so I am
very grateful for every tip/suggestion I receive. Please note that
what I provided was *preprocessor output* (to help in diagnosing the
"C to ASM" translation). The actual code was:

   while (gbus_read_reg32(SYS_cpuclk_div_ctrl) >> 31);
   gbus_write_reg32(SYS_cpuclk_div_ctrl, 0);

these being macro wrappers:

#define gbus_read_reg32(r)      __raw_readl((volatile void __iomem *)IO_ADDRESS(r))
#define gbus_write_reg32(r, v)  __raw_writel(v, (volatile void __iomem *)IO_ADDRESS(r))


> - don't use __raw_readl in driver code, use readl or readl_relaxed.

Can you expand a bit on the differences between these variants?
__raw_readl, readl, readl_relaxed

Is this an appropriate reference?
https://www.kernel.org/doc/htmldocs/deviceiobook/index.html
( https://lwn.net/Articles/66938/ )

NB: if it matters, I am accessing memory-mapped registers, across
a non-standard (I mean not PCI, ISA, etc) memory bus.

arch/arm/include/asm/io.h defines
#define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32) __raw_readl(c)); __r; })
#define readl(c)         ({ u32 __v = readl_relaxed(c); __iormb(); __v; })
#define ioread32(p)	 ({ unsigned int __v = le32_to_cpu((__force __le32)__raw_readl(p)); __iormb(); __v; })
static inline u32 __raw_readl(const volatile void __iomem *addr)
{
	u32 val;
	asm volatile("ldr %1, %0"
		     : "+Qo" (*(volatile u32 __force *)addr),
		       "=r" (val));
	return val;
}

As far as I can tell, readl and ioread32 are equivalent, right?


> - When you do a busy-loop, add a cpu_relax().

LDD3 mentions cpu_relax in the context of long delays.
My busy wait loop has a maximum latency of 60 µs, and the cpufreq core
is given this information, so that, typically, execution falls through
the loop at the first iteration.

Should I still use cpu_relax? Doesn't that risk adding latency to the
calling thread?

On my platform, cpu_relax is defined to a barrier:
__asm__ __volatile__("": : :"memory")


> - use proper types: 'void __iomem *', not 'volatile void *'.

The actual code does use the __iomem attribute.

> - use of_iomap or devm_ioremap_resource to get to the pointer for
>    a device, don't just hardcode virtual addresses.

About that. If nothing had been done, 0xf0010024 would be an
invalid virtual address, and reading from that address would
generate a TLB miss, right? So something must have configured
the TLB to accept and translate this address correctly.

I'm looking for an iomap or ioremap call, right?

I'm asking because I have an idea in mind: on the bus, the first
16 MB contains only memory-mapped registers, so I've been thinking
I can map this region at init, and keep it for the lifetime of the
system. It would use only one entry in the TLB, since the CPU
supports 16 MB super-sections (or whatever they are called).

I could even lock that entry in the TLB so that these accesses
are guaranteed to never TLB miss, right?

I'm also wondering: why did we choose to map physical address
PA at virtual address VA = PA + 0xf000000? What happens if I
map PA at VA = PA? Is it a problem if I "shadow" the 0 page?
(I imagine it is.)

Regards.




More information about the linux-arm-kernel mailing list