Bug in split transactions on Raspberry Pi

Alan Stern stern at rowland.harvard.edu
Wed Jan 27 12:59:10 PST 2016


On Wed, 27 Jan 2016, Doug Anderson wrote:

> > In the end I just used the contents of the dwc2 directory from Linus's
> > current tree -- I don't think it has changed since 4.5-rc1.  Your
> > patches applied on top of that with no issues.
> >
> > They seem to work.  Is there anything in particular you would like me
> > to test?
> 
> I was sorta wondering if it fixed the original problem you were
> seeing?  Unless I missed it it sounded like it was still broken on ToT
> and I was wondering if my series happened to help in any way...

They did not fix the original problem.  I still have no idea what could 
be causing the device's selective behavior.  If you want, I can send 
you the .tdc analyzer files for kernels with and without your patch 
series.  Maybe you'll see something I missed.

> Other than that the series mostly improves compatibility with splits,
> especially when many periodic devices are plugged in (like a Full
> Speed USB audio device plugged in at the same time as several
> keyboards).

Alright, I can try something like that.

> > One problem I noticed: The controller does hub status polling to an
> > external high-speed hub much too frequently.  The interrupt endpoint's
> > bInterval value is 12, meaning the polling interval should be 256 ms.
> > But when data was available, the polls were issued every 2 microframes
> > instead of every 2048.
> >
> > Could the driver be rescheduling the interrupt endpoint every time an
> > URB completes and a new one is submitted?  That's what it looks like.
> > This might be related to the giving-URBs-back-in-a-tasklet change.
> 
> Hrm.  That started with this series?  OK, I'll take a look and see if
> it can reproduce.

It did.  Without your patches the opposite problem occurred; the status 
polls sometimes didn't get sent when they should have.

> Oh, wait, I think I know what this is.  This is probably ("usb: dwc2:
> host: Manage frame nums better in scheduler").  I'd bet that if you go
> before that change then the behavior will be the way it was.
> 
> In said patch you can see that dwc2_schedule_periodic() will now
> always pick a new first frame.  As I understand it that will happen if
> there was ever a short period of time when we weren't scheduled and I
> was worried about the next slice being in the past.  It wouldn't hurt
> to change the test to only pick a new first frame if the existing
> first frame is in the past.  That would probably fix your problem.

Are you perhaps thinking of isochronous transfers?  This is an
interrupt endpoint, not isochronous.  Individual interrupt transfers
don't have assigned slices.  The "Add a delay before releasing periodic
bandwidth" patch seems more relevant.  Or maybe "Schedule periodic
right away if it's time", although that comes before the tasklet patch.

> Question: is the behavior you're seeing illegal / undesirable?  It
> only happens when there's data available, right?  My current
> (rewritten) reservation algorithm in dwc2 is pretty crude  From a high
> speed device perspective it reserves a timeslot for the EP in every
> microframe even if the EP has a long interval.  It won't typically
> schedule the EP more often than every "interval" microframes, but the
> reservation is there.  That means we can't support quite as many
> devices at the same time, but it has the advantage that if we screw up
> (like if we miss an SOF) we know we can actually try to make up for it
> quickly and we won't collide with anyone else.  The whole thing also
> simplifies the scheduler.  In your particular case, we used our
> existing reservation to schedule the device a little sooner than
> requested.

It's not illegal.  Interrupt transfers are guaranteed to be scheduled
at least as often as bInterval; there's nothing wrong with issuing them
more frequently.  But it is a waste of bandwidth and CPU time to issue
transfers more often than necessary, so it is undesirable in that
sense.

You know, from what you've said it seems that writing a proper periodic
scheduler would be _easier_ for DWC2 than for EHCI.  The EHCI hardware
requires special data structures to handle cases where a SPLIT packet
occurs in microframe 7 or microframe 0 of the following frame; those
structures are a nuisance to manage and ehci-hcd doesn't bother to
implement them.  It sounds like DWC2 wouldn't need any of that.  
Furthermore, because coordination between the CPU and the EHCI core is
rather asynchronous, it's difficult to change the microframe allotment
for an interrupt endpoint should the schedule need to be rebalanced --
that hasn't been implemented either.  Again, this should be much
simpler for DWC2.

I've got the overall design for a proper scheduler worked out in my
head, but I have never sat down and written any of it because of the
inertia involved in these two difficult requirements.  :-(

Alan Stern




More information about the linux-rpi-kernel mailing list