[Regression] FEC: Panic on suspend in 3.16
Martin Fuzzey
mfuzzey at parkeon.com
Wed Aug 6 07:52:55 PDT 2014
Hi all,
I am using the fec ethernet driver on a i.MX53 SoC.
All was working fine on 3.13 but, after upgrading to 3.16 I now get a
panic on suspend:
# echo mem > /sys/power/state
[ 24.429549] PM: Syncing filesystems ... done.
[ 24.439586] (NULL device *): Direct firmware load failed with error -2
[ 24.446157] (NULL device *): Falling back to user helper
[ 24.451574] (NULL device *): Direct firmware load failed with error -2
[ 24.458121] (NULL device *): Falling back to user helper
[ 24.517968] Freezing user space processes ... (elapsed 0.001 seconds)
done.
[ 24.526274] Freezing remaining freezable tasks ... (elapsed 0.001
seconds) done.
[ 24.579436] @MF@ fec_suspend: dev=dc421048 (63fec000.etherne) ndev=
(null)
[ 24.586429] Unable to handle kernel NULL pointer dereference at
virtual address 0000002c
[ 24.594553] pgd = dd324000
[ 24.597261] [0000002c] *pgd=00000000
[ 24.600867] Internal error: Oops: 5 [#1] ARM
[ 24.605137] Modules linked in:
[ 24.608208] CPU: 0 PID: 91 Comm: sh Not tainted
3.16.0-pknbsp-svn1549-atag-v3.16-101-gabdac82-dirty #426
[ 24.617691] task: dd1e2400 ti: dd22c000 task.ti: dd22c000
[ 24.623106] PC is at fec_suspend+0x2c/0x90
[ 24.627203] LR is at fec_suspend+0x2c/0x90
[ 24.631301] pc : [<c0370564>] lr : [<c0370564>] psr: 60070013
[ 24.631301] sp : dd22de18 ip : 00000001 fp : c08521bc
[ 24.642778] r10: c0840e1c r9 : 00000000 r8 : c0370538
[ 24.648002] r7 : dc421048 r6 : c08d33cc r5 : 00000000 r4 : 00000000
[ 24.654529] r3 : 00000000 r2 : c0840f84 r1 : 00000000 r0 : 0000003f
[ 24.661058] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM
Segment user
[ 24.668193] Control: 10c5387d Table: 8d324019 DAC: 00000015
[ 24.673938] Process sh (pid: 91, stack limit = 0xdd22c238)
[ 24.679423] Stack: (0xdd22de18 to 0xdd22e000)
[ 24.683781]
de00: 00000000
00000000
[ 24.691961] de20: 00000000 c02da7e0 00000000 c08d33cc 00000000
00000000 dc421048 00000000
[ 24.700141] de40: c08d33cc 00000002 dc42107c c02db6ec c08521bc
008521bc dc4210ac dc421048
[ 24.708320] de60: c08521bc c08d33cc c08521f4 c02dc964 c08521bc
00000002 b263440a 00000005
[ 24.716500] de80: b263440a 00000005 00000003 c0883430 00000003
c0872e48 dd846400 c0840f44
[ 24.724680] dea0: 00000004 c0705838 00000000 c0047a34 00000000
c05c3bdc c076781c dd22ded4
[ 24.732859] dec0: 00000003 dd22ded4 dd846400 00000000 00000003
c0872e48 dd846400 c0840f44
[ 24.741038] dee0: c0705838 c0047f84 c0840f3c 00000003 00000003
c0046c0c dc04c2d0 dd846400
[ 24.749218] df00: 00000004 dd22df80 dd846400 dd84648c dd846480
c0241e04 00000004 c0101848
[ 24.757398] df20: c0101804 00000000 00000000 c0100f10 00000000
00000000 dd7570c0 00000004
[ 24.765579] df40: b741ea8c dd22df80 00000004 dd22c000 b741ea8c
c00acf70 00000000 c00c4310
[ 24.773759] df60: 00060003 00000000 00000000 dd7570c0 dd7570c0
00000004 b741ea8c c00ad380
[ 24.781938] df80: 00000000 00000000 dd7570c0 b741ea8c 00000004
00000001 00000004 c000f044
[ 24.790117] dfa0: 00000000 c000eec0 b741ea8c 00000004 00000001
b741ea8c 00000004 ffffffff
[ 24.798298] dfc0: b741ea8c 00000004 00000001 00000004 be917930
b741d550 00000000 00000000
[ 24.806479] dfe0: b6ff0f40 be917908 b6fd4ed3 b6f3f408 60030010
00000001 00280004 e4518008
[ 24.814677] [<c0370564>] (fec_suspend) from [<c02da7e0>]
(dpm_run_callback+0x44/0x7c)
[ 24.822514] [<c02da7e0>] (dpm_run_callback) from [<c02db6ec>]
(__device_suspend+0x128/0x388)
[ 24.830958] [<c02db6ec>] (__device_suspend) from [<c02dc964>]
(dpm_suspend+0x58/0x214)
[ 24.838889] [<c02dc964>] (dpm_suspend) from [<c0047a34>]
(suspend_devices_and_enter+0x98/0x3b8)
[ 24.847596] [<c0047a34>] (suspend_devices_and_enter) from
[<c0047f84>] (pm_suspend+0x230/0x2dc)
[ 24.856299] [<c0047f84>] (pm_suspend) from [<c0046c0c>]
(state_store+0x70/0xd4)
[ 24.863623] [<c0046c0c>] (state_store) from [<c0241e04>]
(kobj_attr_store+0x14/0x20)
[ 24.871379] [<c0241e04>] (kobj_attr_store) from [<c0101848>]
(sysfs_kf_write+0x44/0x48)
[ 24.879392] [<c0101848>] (sysfs_kf_write) from [<c0100f10>]
(kernfs_fop_write+0xd0/0x180)
[ 24.887576] [<c0100f10>] (kernfs_fop_write) from [<c00acf70>]
(vfs_write+0xa4/0x1ac)
[ 24.895324] [<c00acf70>] (vfs_write) from [<c00ad380>]
(SyS_write+0x40/0x8c)
[ 24.902385] [<c00ad380>] (SyS_write) from [<c000eec0>]
(ret_fast_syscall+0x0/0x30)
[ 24.909959] Code: 05903008 e58d4000 e59f0060 eb094d90 (e594302c)
[ 24.916110] ---[ end trace cce4e72dfc39eadf ]---
[ 24.920755] Kernel panic - not syncing: Fatal exception
[ 24.925983] drm_kms_helper: panic occurred, switching back to text
console
[ 24.932892] ---[ end Kernel panic - not syncing: Fatal exception
The problem is that in
static int
fec_suspend(struct device *dev)
{
struct net_device *ndev = dev_get_drvdata(dev);
struct fec_enet_private *fep = netdev_priv(ndev);
ndev is NULL
Adding the following hack patch:
diff --git a/drivers/net/ethernet/freescale/fec_main.c
b/drivers/net/ethernet/freescale/fec_main.c
index 77037fd..64edfa8 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2486,6 +2486,9 @@ fec_probe(struct platform_device *pdev)
const struct of_device_id *of_id;
static int dev_id;
+ printk(KERN_INFO "@MF@ %s probing dev=%p (%s)\n", __func__,
+ &pdev->dev, dev_name(&pdev->dev));
+
of_id = of_match_device(fec_dt_ids, &pdev->dev);
if (of_id)
pdev->id_entry = of_id->data;
@@ -2616,6 +2619,9 @@ fec_probe(struct platform_device *pdev)
netdev_info(ndev, "registered PHC device %d\n",
fep->dev_id);
INIT_DELAYED_WORK(&(fep->delay_work.delay_work), fec_enet_work);
+
+ printk(KERN_INFO "@MF@ %s probed dev=%p (%s)\n", __func__,
+ &pdev->dev, dev_name(&pdev->dev));
return 0;
failed_register:
@@ -2661,6 +2667,12 @@ fec_suspend(struct device *dev)
struct net_device *ndev = dev_get_drvdata(dev);
struct fec_enet_private *fep = netdev_priv(ndev);
+ printk(KERN_INFO "@MF@ %s: dev=%p (%s) ndev=%p\n", __func__,
dev, dev_name(dev), ndev);
+ if (!ndev) {
+ printk(KERN_INFO "@MF@ ignoring null\n");
+ return 0;
+ }
+
if (netif_running(ndev)) {
fec_stop(ndev);
netif_device_detach(ndev);
@@ -2681,6 +2693,12 @@ fec_resume(struct device *dev)
struct fec_enet_private *fep = netdev_priv(ndev);
int ret;
+ printk(KERN_INFO "@MF@ %s: dev=%p (%s) ndev=%p\n", __func__,
dev, dev_name(dev), ndev);
+ if (!ndev) {
+ printk(KERN_INFO "@MF@ ignoring null\n");
+ return 0;
+ }
+
if (fep->reg_phy) {
ret = regulator_enable(fep->reg_phy);
if (ret)
Gives these logs:
# dmesg | grep MF
<6>[ 1.025267] @MF@ fec_probe probing dev=dc0cfa10 (63fec000.ethernet)
<6>[ 1.057471] @MF@ fec_probe probed dev=dc0cfa10 (63fec000.ethernet)
# echo mem > /sys/power/state
[ 33.329502] PM: Syncing filesystems ... done.
[ 33.357146] (NULL device *): Direct firmware load failed with error -2
[ 33.363769] (NULL device *): Falling back to user helper
[ 33.369260] (NULL device *): Direct firmware load failed with error -2
[ 33.375792] (NULL device *): Falling back to user helper
[ 33.461774] Freezing user space processes ... (elapsed 0.001 seconds)
done.
[ 33.470200] Freezing remaining freezable tasks ... (elapsed 0.001
seconds) done.
[ 33.519954] @MF@ fec_suspend: dev=dc421048 (63fec000.etherne) ndev=
(null)
[ 33.526931] @MF@ ignoring null <==== Would have panicked here
[ 33.533289] @MF@ fec_suspend: dev=dc0cfa10 (63fec000.ethernet)
ndev=dc02c000
[ 33.542266] PM: suspend of devices complete after 61.683 msecs
[ 33.548114] PM: suspend devices took 0.070 seconds
[ 33.554874] PM: late suspend of devices complete after 1.918 msecs
[ 33.563101] PM: noirq suspend of devices complete after 1.971 msecs
So we see that the driver is only being probed once (with name
63fec000.ethernet)
But that the suspend callback is being called twice, for
"63fec000.etherne" and "63fec000.ethernet"
The first of which has no driver data, hence the panic.
The same patch on 3.13 gives:
6>[ 0.998656] @MF@ fec_probe probing dev=dc10b410 (63fec000.ethernet)
<6>[ 1.032098] @MF@ fec_probe probed dev=dc10b410 (63fec000.ethernet)
# echo mem > /sys/power/state
[ 52.854713] PM: Syncing filesystems ... done.
[ 52.897069] (NULL device *): Direct firmware load failed with error -2
[ 52.903895] (NULL device *): Falling back to user helper
[ 52.910300] (NULL device *): Direct firmware load failed with error -2
[ 52.916914] (NULL device *): Falling back to user helper
[ 53.065496] Freezing user space processes ... (elapsed 0.024 seconds)
done.
[ 53.097279] Freezing remaining freezable tasks ... (elapsed 0.001
seconds) done.
[ 53.157807] @MF@ fec_suspend: dev=dc10b410 (63fec000.ethernet)
ndev=dc1e7800
[ 53.173097] PM: suspend of devices complete after 65.114 msecs
[ 53.178964] PM: suspend devices took 0.070 seconds
[ 53.185287] PM: late suspend of devices complete after 1.513 msecs
[ 53.192888] PM: noirq suspend of devices complete after 1.404 msecs
No second device here.
Anyone else seeing this?
Any ideas?
Unfortunately I have local patches that will make bisect difficult.
Regards,
Martin
More information about the linux-arm-kernel
mailing list