4.16 OMAP serial transmit corruption?

Russell King - ARM Linux linux at armlinux.org.uk
Wed Apr 18 05:47:25 PDT 2018


On Wed, Apr 18, 2018 at 12:00:33PM +0100, Russell King - ARM Linux wrote:
> On Wed, Apr 18, 2018 at 12:27:02PM +0200, Michael Nazzareno Trimarchi wrote:
> > Hi
> > 
> > On Wed, Apr 18, 2018 at 11:59 AM, Russell King - ARM Linux
> > <linux at armlinux.org.uk> wrote:
> > > On Wed, Apr 18, 2018 at 02:41:43PM +0530, Vignesh R wrote:
> > >>
> > >>
> > >> On Tuesday 17 April 2018 02:50 PM, Vignesh R wrote:
> > >> >
> > >> >
> > >> > On Monday 16 April 2018 09:15 PM, Tony Lindgren wrote:
> > >> >> * Russell King - ARM Linux <linux at armlinux.org.uk> [180416 15:19]:
> > >> >>> Hi,
> > >> >>>
> > >> >>> I'm not entirely sure what's going on, but I see corrupted characters
> > >> >>> with the serial console on the OMAP4430 SDP board.  During boot,
> > >> >>> everything seems fine, the problem appears to be userspace output.
> > >> >>>
> > >> >>> For example, if I edit a file, then quit vi:
> > >> >>>
> > >> >>> :q■■%■■B■■Z■root at omap-4430sdp:~#
> > >> >>
> > >> >> I don't think I've seen that one. What I've seen few times is
> > >> >> typing a key on the serial console echoing back the previous
> > >> >> character typed while the new character won't get displayed
> > >> >> until hitting keyboard again. Only rebooting the device seems
> > >> >> to solve this. This is with 4430 ES2.3 revision.
> > >> >>
> > >> >> I wonder if we're missing some parts of errata i202 handling
> > >> >> in omap_8250_mdr1_errataset()?
> > >> >>
> > >>
> > >> I wonder if the extra read of MDR1 register at the beginning of
> > >> omap_8250_mdr1_errataset() compared to omap-serial is the issue.
> > >> errata i202 says access to MDR1 can cause data corruption.
> > >> Assuming both reads and writes can cause glitch then, that read
> > >> is not following advisory:
> > >>
> > >> I don't have SDP board so, could you verify if below diff helps:
> > >>
> > >>
> > >> diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
> > >> index 6aaa84355fd1..8ab9d0a1b1eb 100644
> > >> --- a/drivers/tty/serial/8250/8250_omap.c
> > >> +++ b/drivers/tty/serial/8250/8250_omap.c
> > >> @@ -163,11 +163,6 @@ static void omap_8250_mdr1_errataset(struct uart_8250_port *up,
> > >>                                      struct omap8250_priv *priv)
> > >>  {
> > >>         u8 timeout = 255;
> > >> -       u8 old_mdr1;
> > >> -
> > >> -       old_mdr1 = serial_in(up, UART_OMAP_MDR1);
> > >> -       if (old_mdr1 == priv->mdr1)
> > >> -               return;
> > >>
> > >>         serial_out(up, UART_OMAP_MDR1, priv->mdr1);
> > >>         udelay(2);
> > >
> > > That doesn't appear to help.
> > >
> > > Looking at the bitstream and comparing what should have been sent with
> > > what was sent, there appears to be some correlation between the two.
> > > It looks like the FTDI is not properly synchronised to the bitstream
> > > coming from the OMAP4430.
> > >
> > > Setting two stop bits on both ends (OMAP4430 and FTDI) appears to
> > > improve the issue, but not completely solve it.
> > 
> > Are you sure about clock error above some tollerance?
> 
> No idea at the moment.  Looking at the bitstream with a scope is the
> next step, but it's not easy to do that with just two hands.  I also
> need to find some way to trigger it reliably.
> 
> Another cause could be that the UART pin is being held high/low for
> some reason (maybe a pinmux problem.)
> 
> Another interesting observation is that if I login over the network and
> then do:
> 
> 	while :; do :; done &
> 	while :; do :; done &
> 
> to occupy both CPUs, and then do:
> 
> 	dmesg | less
> 
> on the console, the problem goes away.  If I only do one while loop,
> the problem is present, but the corruption looks like it happens at a
> different point in the serial stream.
> 
> This would seem to point the blame away from clocks or pinmux, and back
> to power management issues.
> 
> I've also tried mimicking the less output with a stand-alone program,
> and that doesn't exhibit the problem - I've tried with various initial
> delays between program start and first output, but this doesn't seem
> to have much effect.  So it seems to need rather precise timing.
> 
> stracing less does change where the corruption happens in the output,
> which also suggests a timing related cause.

Okay, I think I'm getting somewhere...  `less' does an ioctl(, TCSETS, )
after outputting a screenful in order to change c_iflag and c_lflag.
The differences are:

	c_iflag 0x1500 -> 0x1000
	c_lflag 0x083b -> 0x0831

Other settings are kept the same.

The iflag changes are IXON | ICRNL, and the lflag changes are
ECHO | ICANON.  Reproducing those changes in my test program shows
the same corruption.

Removing the lflag changes makes no difference.  Removing the ICRNL
also makes no difference - the problem is still there.  Removing
the IXON change and the problem vanishes.

Given that the serial driver rewrites the entire UART configuration
on a termios change that affects any hardware settings, this is
rather expected to happen.

So, the question becomes whether userspace is acting correctly - and
I'd say no.  Looking at _real_ `less' (iow, not the busybox version
that I seem to have on the OMAP4430) it doesn't do this fiddling with
termios settings just before waiting for input.  Moreover, I can't see
_any_ reason for `less' of any kind to be fiddling with IXON.

There is the remaining question about the proper behaviour of setting
termios modes while there is a transmit operation in progress - I know
of several programs that do this.  A TCSETS operation is defined to
occur "immediately" by the spec, but is it reasonable to change the
modes mid-transmission of a character (which _will_ corrupt the
character), or should they be changed at a character boundary (or at
whatever character boundary the hardware is capable of.)

I note that if DMA is enabled, 8250_omap delays a TCSETS operation
until DMA has completed, so I suspect that the problem I'm seeing
will go away if I enable DMA.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up



More information about the linux-arm-kernel mailing list