[Bug] VCHIQ functional test broken
Stefan Wahren
stefan.wahren at i2se.com
Mon Apr 24 15:35:04 EDT 2017
> Russell King - ARM Linux <linux at armlinux.org.uk> hat am 24. April 2017 um 20:59 geschrieben:
>
>
> On Mon, Apr 24, 2017 at 07:42:27PM +0200, Stefan Wahren wrote:
> > > What I can't see is how changing flush_dcache_page() has possibly broken
> > > this: it seems to imply that you're getting flush_dcache_page() somehow
> > > called inbetween mapping it for DMA, using it for DMA and CPU accesses
> > > occuring. However, the driver doesn't call flush_dcache_page() other
> > > than through get_user_pages() - and since dma_map_sg() is used in that
> > > path, it should be fine.
> > >
> > > So, I don't have much to offer.
> > >
> > > However, flush_dcache_page() is probably still a tad heavy - it currently
> > > flushes to the point of coherency, but it's really about making sure that
> > > the user vs kernel mappings are coherent, not about device coherency.
> > > That's the role of the DMA API.
> >
> > Currently i don't care too much about performance. More important is to
> > fix this regression, because i'm not able to verify the function of this
> > driver efficiently.
>
> This is a driver in staging, and the reason its in staging is because it
> has problems. This is just another example of another problem it has...
> Yes, the regression is regrettable, but I don't think it's something to
> get hung up on.
>
> > I'm thinking about 2 options:
> >
> > 1) add a force parameter to flush_dcache_page() which make it possible
> > to override the new logic
> > 2) create a second version of flush_dcache_page() with the old behavior
>
> Neither, really, because, as I've already explained, flush_dcache_page()
> has nothing what so ever to do with DMA coherence - and if a driver
> breaks because we change its semantic slightly (but still in a compliant
> way) then it's uncovered a latent bug in _that_ driver.
>
> Rather than trying to "fix" flush_dcache_page() to work how this driver
> wants it to (which is a totally crazy thing to propose - we can't go
> shoe-horning driver specific behaviour into these generic flushing
> functions), it would be much better to work out why the driver has
> broken.
I totally agree. I had the hope we could make a workaround within the driver until we found the real issue.
>
> I see the kernel driver has its own logging (using vchiq_log_info() etc?)
> maybe a starting point would be to compare the output from a working
> kernel with a non-working kernel to get more of an idea where the problem
> actually is?
I will try ...
>
> What I'm willing to do is to temporarily drop the offending patch for
> the next merge window to give more time for this driver to be debugged,
> but I will want to re-apply it after that merge window closes.
Since this is already in 4.11, please keep it. At least this makes it easier to reproduce. No need for more confusion and effort.
More information about the linux-arm-kernel
mailing list