[PATCH] arm: spin one more cycle in timer-based delays
Mason
slash.tmp at free.fr
Fri Nov 18 09:51:19 PST 2016
On 18/11/2016 15:18, Mason wrote:
> On 18/11/2016 13:54, Russell King - ARM Linux wrote:
>> On Fri, Nov 18, 2016 at 12:06:30PM +0000, Will Deacon wrote:
>>> On Tue, Nov 15, 2016 at 02:36:33PM +0100, Mason wrote:
>>>> When polling a tick counter in a busy loop, one might sample the
>>>> counter just *before* it is updated, and then again just *after*
>>>> it is updated. In that case, while mathematically v2-v1 equals 1,
>>>> only epsilon has really passed.
>>>>
>>>> So, if a caller requests an N-cycle delay, we spin until v2-v1
>>>> is strictly greater than N to avoid these random corner cases.
>>>>
>>>> Signed-off-by: Mason <slash.tmp at free.fr>
>>>> ---
>>>> arch/arm/lib/delay.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm/lib/delay.c b/arch/arm/lib/delay.c
>>>> index 69aad80a3af4..3f1cd15ca102 100644
>>>> --- a/arch/arm/lib/delay.c
>>>> +++ b/arch/arm/lib/delay.c
>>>> @@ -60,7 +60,7 @@ static void __timer_delay(unsigned long cycles)
>>>> {
>>>> cycles_t start = get_cycles();
>>>>
>>>> - while ((get_cycles() - start) < cycles)
>>>> + while ((get_cycles() - start) <= cycles)
>>>> cpu_relax();
>>>> }
>>>
>>> I thought a bit about this last night. It is well known that the delay
>>> routines are not guaranteed to be accurate, and I don't *think* that's
>>> limited to over-spinning, so arguably this isn't a bug. However, taking
>>> the approach that "drivers should figure it out" is also unhelpful,
>>> because the frequency of the underlying counter isn't generally known
>>> and so drivers can't simply adjust the delay parameter.
>>
>> I don't think this change makes any sense whatsoever. udelay() is
>> inaccurate, period. It _will_ give delays shorter than requested
>> for many reasons, many of which can't be fixed.
>
> If I understand correctly, udelay is, in fact, a wrapper around
> several possible implementations. (Run-time decision on arch/arm)
>
> 1) The "historical" software loop-based method, which is calibrated
> during init.
>
> 2) A hardware solution, based on an actual timer, ticking away
> at a constant frequency.
>
> 3) others?
>
> Solution 1 (which probably dates back to Linux 0.1) suffers from
> the inaccuracies you point out, because of interrupt latency
> during calibration, DVFS, etc.
>
> On the other hand, it is simple enough to implement solution 2
> in a way that guarantees that spin_for_n_cycles(n) spins
> /at least/ for n cycles.
>
> On platforms using timer-based delays, there can indeed exist
> a function guaranteeing a minimal delay.
>
>
>> Having a super-accurate version just encourages people to write broken
>> drivers which assume (eg) that udelay(10) will give _at least_ a 10us
>> delay. However, there is no such guarantee.
>
> You say that calling udelay(10) expecting at least 10 µs is broken.
> This fails the principle of least astonishment in a pretty big way.
> (If it returns after 9.9 µs, I suppose it could be OK. But for
> shorter delays, the relative error grows.)
>
> A lot of drivers implement a spec that says
> "do this, wait 1 µs, do that".
>
> Driver writers would typically write
> do_this(); udelay(1); do_that();
>
> Do you suggest they should write udelay(2); ?
> But then, it's not so easy to follow the spec because everything
> has been translated to a different number.
> Hide everything behind a macro?
>
> Note that people have been using ndelay too.
> (I count 182 in drivers, 288 overall.)
>
> drivers/cpufreq/s3c2440-cpufreq.c: ndelay(20);
>
> I don't think the author expects this to return immediately.
>
>
>> So, having udelay() for timers return slightly short is actually a good
>> thing - it causes people not to make the mistake to be soo accurate
>> with their delay specifications.
>
> I disagree that having an implementation which introduces
> hard-to-track-down random heisenbugs is a good thing.
>
>
>> So, NAK on this change. udelay is not super-accurate.
>
> usleep_range() fixed this issue recently.
> 6c5e9059692567740a4ee51530dffe51a4b9584d
> https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=timers/core&id=6c5e9059692567740a4ee51530dffe51a4b9584d
>
> Do you suggest driver writers should use usleep_range()
> instead of udelay?
> (When does usleep_range start making sense? Around 50/100 µs
> and above?)
>
>> Reference (and this is the most important one):
>>
>> http://lists.openwall.net/linux-kernel/2011/01/12/372
I forgot to explain how I came to work in this area of the code.
When my NAND controller's driver probes, it scans the chips
for bad blocks, which generates 65000 calls to ndelay(100);
(for 1 GB of NAND). For larger sizes of NAND, the number can
climb to 500k calls.
Currently, on arch/arm, ndelay is rounded *up* to the
nearest microsecond. So this might generate a total
delay of up to 500 ms, when 50 ms would be sufficient.
So I locally defined ndelay as
#define NDELAY_MULT ((UL(2199) * HZ) >> 11)
#define ndelay(n) __const_udelay((n) * NDELAY_MULT)
I use a 27 MHz tick counter (37 ns period).
ndelay() was resolving to __timer_delay(2)
which might delay only a single tick, so 37 ns.
So the NAND framework occasionally (falsely) flags a block
as bad (random).
There are two sources of error in the timer-based code:
1) rounding down in __timer_const_udelay => loses 1 cycle
2) spinning one cycle too few => loses 1 cycle
With these two errors corrected, I am able to init the
NAND driver faster, with no blocks incorrectly marked
as bad.
Regards.
More information about the linux-arm-kernel
mailing list