Armada XP (MV78460): BUG in netdevice.h with maxcpus=2

Jisheng Zhang jszhang at marvell.com
Fri Jan 8 05:21:09 PST 2016


On Fri, 8 Jan 2016 21:03:09 +0800
Jisheng Zhang <jszhang at marvell.com> wrote:

> On Fri, 8 Jan 2016 20:45:23 +0800
> Jisheng Zhang <jszhang at marvell.com> wrote:
> 
> > Dear Russell,
> > 
> > On Fri, 8 Jan 2016 10:57:21 +0000 Russell King - ARM Linux wrote:
> >   
> > > On Fri, Jan 08, 2016 at 06:25:37PM +0800, Jisheng Zhang wrote:    
> > > > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > > > index ed622fa..e1242f4 100644
> > > > --- a/drivers/net/ethernet/marvell/mvneta.c
> > > > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > > > @@ -2446,7 +2446,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
> > > >  	mvneta_port_enable(pp);
> > > >  
> > > >  	/* Enable polling on the port */
> > > > -	for_each_present_cpu(cpu) {
> > > > +	for_each_online_cpu(cpu) {
> > > >  		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
> > > >  
> > > >  		napi_enable(&port->napi);
> > > > @@ -2472,7 +2472,7 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
> > > >  
> > > >  	phy_stop(pp->phy_dev);
> > > >  
> > > > -	for_each_present_cpu(cpu) {
> > > > +	for_each_online_cpu(cpu) {
> > > >  		struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
> > > >  
> > > >  		napi_disable(&port->napi);
> > > > @@ -2907,7 +2907,7 @@ static int mvneta_stop(struct net_device *dev)
> > > >  	mvneta_stop_dev(pp);
> > > >  	mvneta_mdio_remove(pp);
> > > >  	unregister_cpu_notifier(&pp->cpu_notifier);
> > > > -	for_each_present_cpu(cpu)
> > > > +	for_each_online_cpu(cpu)
> > > >  		smp_call_function_single(cpu, mvneta_percpu_disable, pp, true);
> > > >  	free_percpu_irq(dev->irq, pp->ports);
> > > >  	mvneta_cleanup_rxqs(pp);      
> > > 
> > > I'm not convinced that this isn't racy - what happens if a CPU is
> > > brought online between unregister_cpu_notifier(&pp->cpu_notifier)
> > > and the following loop?  We'll end up calling mvneta_percpu_disable()
> > > for the new CPU - is that harmless?
> > > 
> > > Similarly, what if the online CPUs change between mvneta_stop_dev()
> > > and mvneta_stop(), and also what about hotplug CPU changes during
> > > the startup path?
> > > 
> > > Secondly, is there a reason for:
> > > 
> > > 	for_each_online_cpu(cpu)
> > > 		smp_call_function_single(cpu, ...)
> > > 
> > > as opposed to:
> > > 
> > > 	smp_call_function(mvneta_percpu_disable, pp, true);
> > >     
> > 
> > Yep, thanks for pointing out the race. Before sending out the patch, I prepared
> > another *wrong*(IMHO) patch to patch smp_prepare_cpus() in arch/arm/kernel/smp.c.  
> 
> Here is the patch. I think it could also fix the issue.

oops, the version can't be compiled, sorry for noise. Here is the updated one:

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index b263613..26c164b 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -432,6 +432,7 @@ void __init smp_prepare_boot_cpu(void)
 
 void __init smp_prepare_cpus(unsigned int max_cpus)
 {
+	unsigned int cpu, max;
 	unsigned int ncores = num_possible_cpus();
 
 	init_cpu_topology();
@@ -443,15 +444,29 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
 	 */
 	if (max_cpus > ncores)
 		max_cpus = ncores;
-	if (ncores > 1 && max_cpus) {
-		/*
-		 * Initialise the present map, which describes the set of CPUs
-		 * actually populated at the present time. A platform should
-		 * re-initialize the map in the platforms smp_prepare_cpus()
-		 * if present != possible (e.g. physical hotplug).
-		 */
-		init_cpu_present(cpu_possible_mask);
 
+	/* Don't bother if we're effectively UP */
+	if (max_cpus <= 1)
+		return;
+
+	/*
+	 * Initialise the present map (which describes the set of CPUs
+	 * actually populated at the present time).
+	 */
+	max = max_cpus;
+	max--;
+	for_each_possible_cpu(cpu) {
+		if (max == 0)
+			break;
+
+		if (cpu == smp_processor_id())
+			continue;
+
+		set_cpu_present(cpu, true);
+		max--;
+	}
+
+	if (ncores > 1 && max_cpus) {
 		/*
 		 * Initialise the SCU if there are more than one CPU
 		 * and let them know where to start.


> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index b263613..f94c755 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -443,15 +443,31 @@ void __init smp_prepare_cpus(unsigned int max_cpus)
>  	 */
>  	if (max_cpus > ncores)
>  		max_cpus = ncores;
> -	if (ncores > 1 && max_cpus) {
> -		/*
> -		 * Initialise the present map, which describes the set of CPUs
> -		 * actually populated at the present time. A platform should
> -		 * re-initialize the map in the platforms smp_prepare_cpus()
> -		 * if present != possible (e.g. physical hotplug).
> -		 */
> -		init_cpu_present(cpu_possible_mask);
>  
> +	/* Don't bother if we're effectively UP */
> +	if (max_cpus <= 1)
> +		return;
> +
> +	/*
> +	 * Initialise the present map (which describes the set of CPUs
> +	 * actually populated at the present time) and release the
> +	 * secondaries from the bootloader.
> +	 *
> +	 * Make sure we online at most (max_cpus - 1) additional CPUs.
> +	 */
> +	max_cpus--;
> +	for_each_possible_cpu(cpu) {
> +		if (max_cpus == 0)
> +			break;
> +
> +		if (cpu == smp_processor_id())
> +			continue;
> +
> +		set_cpu_present(cpu, true);
> +		max_cpus--;
> +	}
> +
> +	if (ncores > 1 && max_cpus) {
>  		/*
>  		 * Initialise the SCU if there are more than one CPU
>  		 * and let them know where to start.
> 
> 
> > 
> > Here is the background:
> > I can reproduce this issue on arm but failed to reproduce it on arm64. The key
> > is what's present cpu.
> > 
> > let's assume a quad core system, boot with maxcpus=2, after booting.
> > 
> > on arm64, present cpus is cpu0, cpu1
> > 
> > on arm, present cpus is cpu0, cpu1, cpu2 and cpu3.
> > 
> > the arm core code requires every platform to update the present map in
> > platforms' smp_prepare_cpus(), but only two or three platforms do so.
> > 
> > Then get back to mvneta issue, during startup, mvneta_start_dev() calls
> > napi_enable() for each present cpu's port. If userspace ask for online
> > cpu2, mvneta_percpu_notifier() will call napi_enable() for cpu2 again,
> > so BUG_ON() is triggered.
> > 
> > I have two solutions:
> > 
> > 1. as the above patch did, then prevent the race as pointed out by
> > get_online_cpus().
> > 
> > 2. make arm platforms smp_prepare_cpus to update the present map or even
> > patch arm core smp_prepare_cpus().
> > 
> > What's the better solution? Could you please guide me?
> > 
> > Thanks in advance,
> > Jisheng
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel at lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel  
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel




More information about the linux-arm-kernel mailing list