[PATCH v3 0/8] dwc2: Fix uframe scheduler + speed up the interrupt handler quite a bit

Douglas Anderson dianders at chromium.org
Mon Nov 16 16:51:16 PST 2015


This series now effectively has two purposes:
1. Speed up dwc2 interrupt latency.
2. Start fixing up the microframe scheduler.

...the two things were separate series in the past but they ended up
running into each other, so now they're combined.

To summarize what we have here:

1. usb: dwc2: rockchip: Make the max_transfer_size automatic

   No brainer.  Can land any time.

2. usb: dwc2: host: Get aligned DMA in a more supported way

   Although this touches a lot of code, it's mostly just deleting
   stuff.  The way this is working is nearly the same as tegra.  Biggest
   objection I expect is that it has too much duplication with tegra and
   musb.  I'd personally prefer to land it now and remove duplication
   later, but up to others.  Speeding up interrupt handler helps with
   SOF scheduling, so this is not just a dumb optimization.

3. usb: dwc2: host: Add scheduler tracing

   Useful for patches below.

4. usb: dwc2: host: Rewrite the microframe scheduler

   Seems hard to believe this would make things worse since the old
   scheduler is easy to break.  Certainly microframe scheduler isn't
   amazing, but small steps, right?

5. usb: dwc2: host: Keep track of and use our scheduled microframe

   Needs review, but seems simple to me.  Maybe doesn't fix everything,
   but fixes some things...

6. usb: dwc2: host: Assume all devices are on one single_tt hub

   Questionable, but maybe worth landing it?

7. usb: dwc2: host: Add a delay before releasing periodic bandwidth

   Pretty much the same patch I sent before, just rebased.

8. usb: dwc2: host: Giveback URB in tasklet context

   Simple and a nice speedup assuming it doesn't break anything.  My
   belief is that our scheduler is already broken enough that things
   aren't made worse by this patch (and lots of things are made better
   by speeding up the interrupt handler and not mising SOFs), but
   welcome other testing and opinions.

===

Below is discussion of some of the speedup stuff.

===

The dwc2 interrupt handler is quite slow.  On rk3288 with a few things
plugged into the ports and with cpufreq locked at 696MHz (to simulate
real world idle system), I can easily observe dwc2_handle_hcd_intr()
taking > 120 us, sometimes > 150 us.  Note that SOF interrupts come
every 125 us with high speed USB, so taking > 120 us in the interrupt
handler is a big deal.

The patches here will speed up the interrupt controller significantly.
After this series, I have a hard time seeing the interrupt controller
taking > 20 us and I don't ever see it taking > 30 us ever in my tests
unless I bring the cpufreq back down.  With the cpufreq at 126 MHz I can
still see the interrupt handler take > 50 us, so I'm sure we could
improve this further.  ...but hey, it's a start.

This series also shows big speed improvements when testing with a USB
Gigabit Ethernet adapter.  Previously the tested adapter would top out
at about 15MB/s.  After these changes it gets about 23MB/s.

In addition to the speedup, this series also has the advantage of
simplifying dwc2 and making it more like everyone else (introducing the
possibility of future simplifications).  Picking this series up will
help your diffstat and likely win you friends.  ;)

===

Steps for gathering data with ftrace:

cd /sys/devices/system/cpu/cpu0/cpufreq/
echo userspace > scaling_governor
echo 696000 > scaling_setspeed

cd /sys/kernel/debug/tracing
echo 0 > tracing_on
echo "" > trace
echo nop > current_tracer
echo function_graph > current_tracer
echo dwc2_handle_hcd_intr > set_graph_function
echo dwc2_handle_common_intr >> set_graph_function
echo dwc2_handle_hcd_intr > set_ftrace_filter
echo dwc2_handle_common_intr >> set_ftrace_filter
echo funcgraph-abstime > trace_options
echo 70 > tracing_thresh
echo 1 > /sys/kernel/debug/tracing/tracing_on

sleep 2
cat trace

===

NOTE: This series doesn't replace any other patches I've submitted
recently, it merely adds another set of changes that upstream could
benefit from.

Changes in v3:
- scheduler tracing new for v3.
- The uframe scheduler patch is folded into optimization series.
- Optimize uframe scheduler "single uframe" case a little.
- uframe scheduler now atop logging patches.
- uframe scheduler now before delayed bandwidth release patches.
- Add defines like EARLY_FRAME_USEC
- Reorder dwc2_deschedule_periodic() in prep for future patches.
- uframe scheduler now shows real usefulness w/ future patches!
- Keep track and use our uframe new for v3.
- Assuming single_tt is new for v3; not terribly well tested (yet).
- Moved periodic bandwidth release delay patch later in the series.

Changes in v2:
- Add a warn if setup_dma is not aligned (Julius Werner).
- Totally rewrote uframe scheduler again after writing test code.
- uframe scheduler atop delayed bandwidth release patches.
- Periodic bandwidth release delay new for V2
- Commit message now says that URB giveback change needs delay change.

Douglas Anderson (8):
  usb: dwc2: rockchip: Make the max_transfer_size automatic
  usb: dwc2: host: Get aligned DMA in a more supported way
  usb: dwc2: host: Add scheduler tracing
  usb: dwc2: host: Rewrite the microframe scheduler
  usb: dwc2: host: Keep track of and use our scheduled microframe
  usb: dwc2: host: Assume all devices are on one single_tt hub
  usb: dwc2: host: Add a delay before releasing periodic bandwidth
  usb: dwc2: host: Giveback URB in tasklet context

 drivers/usb/dwc2/core.c      |  21 +--
 drivers/usb/dwc2/core.h      |  20 ++-
 drivers/usb/dwc2/hcd.c       | 177 +++++++++----------
 drivers/usb/dwc2/hcd.h       |  30 ++--
 drivers/usb/dwc2/hcd_intr.c  |  73 +-------
 drivers/usb/dwc2/hcd_queue.c | 407 ++++++++++++++++++++++++++++++-------------
 drivers/usb/dwc2/platform.c  |   2 +-
 7 files changed, 416 insertions(+), 314 deletions(-)

-- 
2.6.0.rc2.230.g3dd15c0




More information about the Linux-rockchip mailing list