[PATCH net-next v4 0/4] Fix OdroidC2 Gigabit Tx link issue
Russell King - ARM Linux
linux at armlinux.org.uk
Fri Jan 6 03:42:26 PST 2017
On Fri, Jan 06, 2017 at 11:11:36AM +0100, Jerome Brunet wrote:
> The purpose of this patch is to provide a way to mark as broken a
> particular eee mode. At first, it had nothing to do with "set_eee" but,
> as Florian rightly pointed out, users shouldn't be able to re-enable a
> broken mode.
I think something else has been missed - I don't see much point to
telling userspace that (eg) 1000baseT EEE is supported and then
ignore attempts to advertise it.
If it's broken, then arguably the hardware doesn't support the mode,
so we should really be masking those bits from the EEE supported mask
as well.
> I wonder if we should return an error though. With the current
> implementation, user space application could simply ask to activate
> everything, excepting the kernel to sort it out and return an error
> only if something horribly wrong happened. If we start returning an
> error for unsupported modes, we could break things. I guess we should
> just silently filter the requested modes.
The ethtool behaviour for advertisments is that errors are not returned
unless the attempted advert is really wrong.
So, for example, when setting an advertisment for link modes, we accept
the user's supplied mask, and bitwise AND it with the supported mask,
so unsupported link modes are cleared. Only if the result is an empty
mask do we then return an error to userspace.
It's similar for forcing the link parameters - phylib attempts to find
the best phy setting mode which fuzzily matches the users request, but
doesn't error out if we can't do exactly what the user requested.
In the EEE case, an empty mask is acceptable (it means "EEE is supported
in no link modes") so it isn't appropriate to return errors there.
> > - maybe the problem here is that the PCS doesn't support support
> > EEE in 1000baseT mode?
>
>
> It does, and that's kind of the problem. EEE in ON for 100Tx and 1000T
> by default with this PHY. I have several platform with the same MAC-PHY
> combination. Only the OdroidC2 shows this particular issue with 1000T-
> EEE
>
> As explained in other mails in this thread. The problem does not come
> from the MAC entering LPI. It actually comes from the link partner
> entering LPI on the Rx path under significant Tx transfer. For some
> reason, this completely mess up our PHY.
For a 1000baseT link to enter low power, both ends have to enter LPI
(see 802.3 78.1.3.3.1) - the Tx and Rx paths can't independently enter
LPI.
So, if you have a busy Tx link, the link itself can't be entering LPI.
Your link partner may be sending a request to enter LPI due to its own
Tx path being idle, which should then be forwarded to your MAC.
It's pretty hard to see what could be messed up with that - I'd have
expected the problems to occur when both ends were idle and the link
had entered low power mode.
> > On the SolidRun boards, they're using AR8035, and have suffered this
> > occasional link drop problem. What has been found is that it seems
> > to
> > be to do with the timing parameters, and it seemed to only be 1000bT
> > that was affected. I don't remember off hand exactly which or what
> > the change was they made to stabilise it though, but I can probabily
> > find out tomorrow.
> >
>
> Since the same combination of MAC-PHY works well on other designs, it
> is also my feeling that is has something to do with some timing
> parameter, maybe related to this particular PCB.
Maybe a different PHY interface? Meson seems to use RGMII, maybe
others use SGMII - but then I'd expect 100base-Tx to also be broken.
So not really sure.
I was talking to Florian about that last night, because the mis-named
phy_init_eee() tests for various phy interface modes before proceeding,
which seems to be fairly rubbish as the list of interface modes is
gradually increasing since it was introduced (and I need to add SGMII
to it.) The conclusion I've come to there is that the test should
never have been part of phylib, because if there are restrictions on
which phy interface modes are allowable for EEE, they're likely to be
either PHY or MAC specific.
The other problem that having the test there causes is that if the
existing users can't handle EEE over SGMII, then when I add SGMII to
support my hardware, they end up breaking - far from desirable.
There's no information on why the test is there, or even which PHYs
or MACs it's applicable to, which makes this unnecessarily more
difficult to now resolve.
My feeling is that the integration of EEE into phylib is fairly poor
at the moment, and we need to be a lot smarter about it.
BTW, one of the problems (not caused by your patch) is that changing
the EEE advertisment does not (on all PHY drivers) cause the link to
be renegotiated - there's no call to phy_start_aneg() when the advert
changes, and even if there was, there's no guarantee that
phy_start_aneg() will even set the AN restart bit in the control
register.
However, given that you're hooking into the set_eee function, I'm not
sure why you placed your EEE advertisment thing into config_aneg() -
isn't it more an initialisation thing (so should be in config_init()?)
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
More information about the linux-amlogic
mailing list