[PATCH] ghes_edac: enable HIP08 platform edac driver

Borislav Petkov bp at alien8.de
Wed May 16 11:29:58 PDT 2018


On Wed, May 16, 2018 at 02:38:38PM +0100, James Morse wrote:
> XGene has its own edac driver, but it doesn't probe when booted via ACPI so
> won't conflict with ghes_edac.

Actually it will. EDAC core can have only one EDAC driver loaded. Don't
ask me why - it has been that way since forever. We can change it some
day but frankly, I don't see reasoning for it. One driver can easily
manage *all* error sources on a system, I'd say.

> ... The thing has 4 dimm slots, but only two are populated. I swapped
> them round and the table was regenerated, so I don't think its faking
> it.

Ok.

> So I think we're good to make the whitelist x86 only.
> Your diff-hunk makes 'idx=-1', so we always get the 'Unfortunately' warning. I'd
> like to suppress this unless force_load has been used.

Yeah, we should handle that differently for ARM. Toshi added the idx
thing in

  5deed6b6a479 ("EDAC, ghes: Add platform check")

to dump this when the platform is not whitelisted. So let's do that:

---
diff --git a/drivers/edac/ghes_edac.c b/drivers/edac/ghes_edac.c
index 863fbf3db29f..473aeec4b1da 100644
--- a/drivers/edac/ghes_edac.c
+++ b/drivers/edac/ghes_edac.c
@@ -440,12 +440,16 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
 	struct mem_ctl_info *mci;
 	struct edac_mc_layer layers[1];
 	struct ghes_edac_dimm_fill dimm_fill;
-	int idx;
+	int idx = -1;
 
-	/* Check if safe to enable on this system */
-	idx = acpi_match_platform_list(plat_list);
-	if (!force_load && idx < 0)
-		return -ENODEV;
+	if (IS_ENABLED(CONFIG_X86)) {
+		/* Check if safe to enable on this system */
+		idx = acpi_match_platform_list(plat_list);
+		if (!force_load && idx < 0)
+			return -ENODEV;
+	} else {
+		idx = 0;
+	}
 
 	/*
 	 * We have only one logical memory controller to which all DIMMs belong.

---

> What is the history behind the fake thing here? It predates 32fa1f53c2d
> "ghes_edac: do a better job of filling EDAC DIMM info", was it to support a
> valid system, or just to ease merging the driver when not all systems had the
> dmi table?

I wouldn't be surprised if there were some, TBH.

Looks to me like it used to fake DIMMs, see

-       /* FIXME: FAKE DATA */
-       dimm->nr_pages = 1000;
-       dimm->grain = 128;
-       dimm->mtype = MEM_UNKNOWN;
-       dimm->dtype = DEV_UNKNOWN;
-       dimm->edac_mode = EDAC_SECDED;

which 32fa1f53c2d removes.

$ git annotate drivers/edac/ghes_edac.c 32fa1f53c2d~1

shows you the driver before the DMI scanning so it looks like initially
it was faking stuff to satisfy EDAC core when it creates sysfs entries
using struct dimm_info descriptors.

> It looks like even the oldest Arm64 ACPI systems have dmi tables, so we can
> probably require DMI or the 'force' flag.

Well, with the hunk above it would still do ghes_edac_count_dimms() on
ARM and if it fails to find something, it will set fake, which is a good
sanity-check as it screams loudly. :)

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.



More information about the linux-arm-kernel mailing list