Looking for pointers for how to determine source of issue
bj at mhlabs.com
Thu May 5 14:54:25 EDT 2011
I don't know if this is the best place to post, but I am hoping if it is not, someone can point me to a better place.
I am trying to determine how and why my program is using CPU time, specifically as it applies to a deadline being missed.
First some background:
My platform is the TI C6A8168 processor; this is a relatively new OMAP derivative. The ARM is a Cortex-A8 running at approximately 1GHz. I am running on 2.6.37 + patches to support the TI816x processor family.
The kernel sources as supplied by TI do not properly support oprofile because they don't provide the PMU interrupt. I have fixed that in my kernel so I can use oprofile.
My application is a user-land program that utilizes FireWire services via a OHCI PCIe card.
If I monitor CPU usage on the system using top or atop, I find that the cpu load varies over time (with a periodicity of 15-30 seconds -- it is not really consistent). The CPU load attributed to the program goes from < 1% to over 50% for a short period of time. It is hard to understand why this would happen, as the work being done by the program is consistent (it does not vary over time).
Further, if I run oprofile in a window around the time that the CPU spikes, I don't see that reflected in the results from oprofile; it reports results consistent with a longer run that includes the spike, or with a shorter run that does not include the spike.
I know that this platform/kernel uses the vanilla sched_clock implementation (rather than some arch specific hardware clock).
Does anyone know how to understand this CPU cycling, or investigate it in more detail? On Mac OS X there is a tool called Shark that can do a system trace that shows all thread tenures over time and things like thread context switches, etc. Are there any tools like that available on Arm Linux? Is this something I could use any of the kernel tracing or probing infrastructure to investigate?
Some of the time that the CPU load is reported as high by top, my app appears to miss a deadline, so it does not appear that this is just a phantom artifact.
Any suggestions would be greatly appreciated.
More information about the linux-arm