[syzbot] linux-next boot error: general protection fault in add_mtd_device

Tetsuo Handa penguin-kernel at I-love.SAKURA.ne.jp
Sat Jul 23 18:33:53 PDT 2022


On 2022/07/22 11:51, Christian Marangi wrote:
> On Fri, Jul 22, 2022 at 11:34:57AM +0900, Tetsuo Handa wrote:
>> mtd_check_of_node() was added by commit ad9b10d1eaada169 ("mtd: core:
>> introduce of support for dynamic partitions").
>>
>> I guess that sometimes (depending on probe timing) mtd->parent is NULL.
>> Please check what mtd->parent == NULL means.
>>
>> +	/* Check if a partitions node exist */
>> +       parent = mtd->parent;
>> +       parent_dn = dev_of_node(&parent->dev);
>>
> 
> Currently there is thix [1].

OK. Then for now can we try that "prevent the crash" patch (and defer
examining the cause of being NULL) ?

> 
> Anyway you comment means a device may probe defer and have the parent
> still set to NULL? How can we check that?

My comment is just a guess. I have zero experience in this area.

If this problem is not timing dependent, syzbot would have already
reported this problem for thousands times. But since syzbot reported
this problem only 84 times in 23 days, I think that this problem is
timing dependent (i.e. some race condition).

> 
> Return PROBE_DEFER always when no mtd parent is found?

We could try tracing what value is assigned to mtd->parent, by
introducing a validation wrapper function (which validates that
the address is a valid kernel address) like

----------------------------------------
diff --git a/include/linux/bug.h b/include/linux/bug.h
index 348acf2558f3..6b1626776b95 100644
--- a/include/linux/bug.h
+++ b/include/linux/bug.h
@@ -91,4 +91,12 @@ static inline __must_check bool check_data_corruption(bool v) { return v; }
 		corruption;						 \
 	}))
 
+extern void check_valid_kernel_ptr(void *arg, const char *name);
+#define valid_kernel_ptr(ptr)						\
+	({								\
+		typeof(ptr) tmp = (ptr);				\
+		check_valid_kernel_ptr(tmp, __stringify(ptr));		\
+		tmp;							\
+	})
+
 #endif	/* _LINUX_BUG_H */
diff --git a/lib/bug.c b/lib/bug.c
index c223a2575b72..6555766134cf 100644
--- a/lib/bug.c
+++ b/lib/bug.c
@@ -231,3 +231,10 @@ void generic_bug_clear_once(void)
 
 	clear_once_table(__start___bug_table, __stop___bug_table);
 }
+
+void check_valid_kernel_ptr(void *arg, const char *name)
+{
+	WARN(!arg, "%s is NULL\n", name);
+	WARN(IS_ERR(arg), "%s is %pe\n", name, arg);
+}
+EXPORT_SYMBOL(check_valid_kernel_ptr);
----------------------------------------

and wrapping all locations that modifies mtd->parent like

----------------------------------------
diff --git a/drivers/mtd/mtdpart.c b/drivers/mtd/mtdpart.c
index d442fa94c872..2e81539ddfd8 100644
--- a/drivers/mtd/mtdpart.c
+++ b/drivers/mtd/mtdpart.c
@@ -81,9 +81,9 @@ static struct mtd_info *allocate_partition(struct mtd_info *parent,
 	 * distinguish between the parent and its partitions in sysfs.
 	 */
 	child->dev.parent = IS_ENABLED(CONFIG_MTD_PARTITIONED_MASTER) || mtd_is_partition(parent) ?
-			    &parent->dev : parent->dev.parent;
+		&parent->dev : valid_kernel_ptr(parent->dev.parent);
 	child->dev.of_node = part->of_node;
-	child->parent = parent;
+	child->parent = valid_kernel_ptr(parent);
 	child->part.offset = part->offset;
 	INIT_LIST_HEAD(&child->partitions);
 
----------------------------------------

. If none of all wrapped assignments triggers, we would need to check
locations that explicitly assign NULL. (That is, possibility of hitting
a race that the parent was once non-NULL but then became NULL due to
error or cleanup path.)

We can add some CONFIG_ option for controlling whether check_valid_kernel_ptr()
should become

#define valid_kernel_ptr(ptr) (ptr)

so that we can avoid overhead when this problem if fixed.

Linus, can we have macros like valid_kernel_ptr() ? Macros like
valid_kernel_ptr(), valid_kernel_ptr_or_null(), valid_kernel_ptr_or_err() can
serve for documentation purpose without runtime cost for non-debug builds.

> 
> [1] https://patchwork.ozlabs.org/project/linux-mtd/patch/20220703095631.16508-1-ansuelsmth@gmail.com/
> 
>> On 2022/06/30 18:32, syzbot wrote:
>>> Hello,
>>>
>>> syzbot found the following issue on:
>>>
>>> HEAD commit:    6cc11d2a1759 Add linux-next specific files for 20220630
>>> git tree:       linux-next
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=1640f850080000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=54f75b620e3845dd
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=fe013f55a2814a9e8cfd
>>> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
>>>
>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>> Reported-by: syzbot+fe013f55a2814a9e8cfd at syzkaller.appspotmail.com
> 




More information about the linux-mtd mailing list