[PATCH] ARM: rockchip: Convert resume code to C

Wed Dec 3 08:57:40 PST 2014

Arnd,

On Wed, Dec 3, 2014 at 6:42 AM, Arnd Bergmann <arnd at arndb.de> wrote:
> On Tuesday 02 December 2014 09:36:00 Doug Anderson wrote:
>> On Tue, Dec 2, 2014 at 1:33 AM, Arnd Bergmann <arnd at arndb.de> wrote:
>> > On Monday 01 December 2014 15:04:59 Doug Anderson wrote:
>> >> On Mon, Dec 1, 2014 at 2:50 PM, Russell King - ARM Linux <linux at arm.linux.org.uk> wrote:
>> > I recently looked at another vendor tree (quantenna wifi access point,
>> > based on arch/arc), which was putting arbitrary functions into SRAM
>> > for performance reasons, in their case the entire hot path for network
>> > switching. Having at least the infrastructure to do this seems like
>> > a great idea, even though it's very hard to do in a general-purpose
>> > kernel, as you'd have a hard time squeezing as much code as possible
>> > into the available SRAM.
>>
>> I'm always a fan of seeing general infrastructure introduced, though
>> we always need to make sure that the general infrastructure makes
>> things easier and not harder.  There's always the danger of adding so
>> much abstraction for a small thing that using it is like pulling
>> teeth.  I'm not saying that's the case here, but it is always a
>> danger.
>>
>> Note: I will point out a critical differences between the "hotpath"
>> problem and the one I'm solving here.  When you're just trying to
>> speed up a hotpath, it's not the end of the world if there's a stray
>> access to SDRAM.  If you happen to access a global variable in SDRAM,
>> or use a libc function to do division, or have a WARN_ON, those things
>> are OK.  It might also be OK if the stack was still in SDRAM.  When
>> you're compiling code that has to run with no other kernel function
>> present it's really nice to link them into a separate executable.
>
> Yes, makes sense. We might be able to use the same trick that we have
> for verifying __init sections though: During the final link of the
> vmlinux or module, check that an SRAM function only calls other
> functions that are in SRAM and accesses global variables that way
> too.

Yup, I thought about this.  You might want some way to make decisions
about whether accesses are OK.  If you're optimizing a hotpath maybe
all accesses are OK (but deserve a warning?).  If you're running code
where SDRAM is not available then no accesses are OK.

> It wouldn't cover any pointers you pass using function arguments
> though, and I don't yet understand the requirements for stack accesses.
> How do you currently deal with local variables that are put on the
> stack by a blob?

The blob sets up its own stack in assembly code.

>> > and I also don't think I want to have
>> > the infrastructure for it in mach-rockchip and would want to see that
>> > at least shared across arch/arm if it's too hard to do
>> > cross-architecture. If you were to include code from drivers/memory/
>> > in the blob, you couldn't keep it in mach-rockchip anyway.
>>
>> I guess I was envisioning that if other places need similar
>> functionality that they would copy the ideas here.  Some of the
>> Makefile bits could possibly be shared through some type of Makefile
>> library.  I know copying code / Makefiles is bad, but sometimes it's
>> the cleanest way to do something.  If we start seeing a lot of
>> duplication then we can make things common and we can truly evaluate
>> whether the common solution is better than the duplication.
>
> The makefile parts should be really easy to share by putting them
> into scripts/Makefile.lib.

Agreed.

>> > AFAICT, the quantenna implementation is similar to the itcm/dtcm
>> > stuff we already have (but are not using upstream), so I wonder
>> > why we can't use that here too, see Documentation/arm/tcm.txt
>>
>> I wasn't aware of the TCM stuff.  Thanks for the pointer!  It looks
>> pretty neat...
>>
>> Ah, but the TCM stuff has a critical difference from my problem.  By
>> the very definition of TCM you don't need to do relocation.
>>
>> TCM has the magical property that you can assign the physical address.
>> That means that you instantly sidestep multiplatform problems of
>> having SRAM at different physical addresses.  You can compile the code
>> to assume it's at 0xfffe0000 and it will work on every single machine
>> out there that needs TCM.  So if you've got a generic "udelay"
>> function you could just mark it as a "tcmfunc" and it will work
>> everywhere.  No relocation needed and the compiler knows exactly where
>> things will be.
>
> Ok, I have to admit that I don't actually understand the differences
> myself. Why does the physical address of the TCM matter? Can't we
> just map the SRAM to a sufficiently large well-known virtual address?

Linus W. got it right when he said I was implicitly alluding to the
fact that I needed to be running with the MMU off.  I'm running resume
code which runs with everything off and all addresses are physical.

In my case I could compile the code PIC/PID if needed, but I don't
think it's so easy with the TCM approach.

>> Unfortunately, the rk3288 doesn't have TCM.  I tried enabling it and
>> got these nice printouts at boot:
>>
>>     DTCM    : 0xfffe8000 - 0xfffe8000   (   0 kB)
>>     ITCM    : 0xfffe0000 - 0xfffe0000   (   0 kB)
>>
>> Instead of TCM I'm using the "PMU SRAM" on the rk3288 which is
>> designed to keep code and data across deep sleep.  Adding relocation
>> to the existing TCM support gets back into the rats nest of issues
>> that I was trying to avoid tackling.
>>
>> A few other TCM thoughts:
>>
>> 1. It sure seems unlikely that the current TCM solution would scale to
>> multiplatform.  Oh right, Linus W said this in his reply, too.  If
>> you've got SoC_A, SoC_B, and SoC_C all marking their functions
>> "tcmfunc" then they'll all be placed in the TCM section, right?
>
> Correct, that would be a problem, at least if the total size grows
> to more than the minimum of any of the chips' physical SRAM.

See below.  I have 4K, which means that the total size of all SoC's
TCM code has to be less than 4K.

>> There's no way to detect that you're on SoC_A and that you only need
>> the SoC_A code.  Given the marching orders of multiplatform,
>> multiplatform, multiplatform then I _think_ that means we shouldn't
>> let anyone merge any code to mainline that uses TCM (unless TCM gets
>> revamped).
>
> Just out of curiosity, what sizes are we looking at here, for the
> code you currently have and the available SRAM on rk3288?

I'm running in 4K of SRAM.  I think my current code is just over 2K.
It's unlikely any other platform would fit.

>> 2. I haven't tried it, but it seems like the compiler still might not
>> catch stray (accidental) accesses from the TCM section to the non-TCM
>> section.  Again, this isn't a showstopper because you'd just track
>> each one down, but it is a nice feature of adding a separate
>> executable.
>
> No, the compiler won't care about this, but as mentioned above we
> can have the kernel linker scripts help us a bit here.

Yup, true.

-Doug