[PATCH AUTOSEL 6.15 032/110] drm/amdgpu: Add basic validation for RAS header

Sasha Levin sashal at kernel.org
Sun Jun 1 16:23:14 PDT 2025


From: Lijo Lazar <lijo.lazar at amd.com>

[ Upstream commit 5df0d6addb7e9b6f71f7162d1253762a5be9138e ]

If RAS header read from EEPROM is corrupted, it could result in trying
to allocate huge memory for reading the records. Add some validation to
header fields.

Signed-off-by: Lijo Lazar <lijo.lazar at amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>
Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
Signed-off-by: Sasha Levin <sashal at kernel.org>
---

Based on my analysis, here is my assessment: **YES** This commit should
be backported to stable kernel trees. Here's my extensive analysis: ##
Security Vulnerability Analysis **Critical Memory Allocation
Vulnerability:** The commit fixes a serious security vulnerability where
corrupted RAS (Reliability, Availability, and Serviceability) EEPROM
header data could cause huge memory allocations. In
`/home/sasha/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:3011`, the
code performs: ```c bps = kcalloc(control->ras_num_recs, sizeof(*bps),
GFP_KERNEL); ``` **Before the fix:** If `control->ras_num_recs` contains
a corrupted large value (e.g., 0xFFFFFFFF), this would attempt to
allocate `0xFFFFFFFF 0001-Fix-Clippy-warnings.patch 0002-Enhance-
inference-prompt-to-utilize-CVEKERNELDIR-whe.patch 0003-Update-to-
latest-version-of-clap.patch Cargo.lock Cargo.toml LICENSE README.md
analyze_merge_commit.sh dpp_rcg_backport_analysis.md
drm_bridge_analysis.txt drm_imagination_register_update_analysis.md
drm_mediatek_mtk_dpi_refactoring_analysis.md io_uring_analysis.txt
ksmbd_analysis.txt merge_commit_analysis.txt model prompt src target
test_gpio_cleanup.txt test_patch.txt verisilicon_av1_4k_analysis.md 24
bytes` = ~96GB of memory, likely causing: 1. System memory exhaustion 2.
Denial of service 3. Potential system crash/instability ## Code Changes
Analysis **1. Version Validation Enhancement:** The fix replaces a
simple `if/else` with a robust `switch` statement: ```c // Before: Only
checked for version >= V2_1 if (hdr->version >= RAS_TABLE_VER_V2_1) { //
After: Explicit validation of known versions switch (hdr->version) {
case RAS_TABLE_VER_V2_1: case RAS_TABLE_VER_V3: // Future-proofing //
V2.1+ handling break; case RAS_TABLE_VER_V1: // V1 handling break;
default: dev_err(adev->dev, "RAS header invalid, unsupported version:
%u", hdr->version); return -EINVAL; } ``` **2. Record Count Bounds
Checking:** Critical addition of bounds validation: ```c if
(control->ras_num_recs > control->ras_max_record_count) {
dev_err(adev->dev, "RAS header invalid, records in header: %u max
allowed :%u", control->ras_num_recs, control->ras_max_record_count);
return -EINVAL; } ``` This prevents the memory allocation attack by
ensuring `ras_num_recs` cannot exceed reasonable bounds. ## Stable Tree
Criteria Assessment ✅ **Fixes important security bug:** Prevents DoS via
memory exhaustion ✅ **Small and contained:** Only adds validation logic,
no functional changes ✅ **Clear side effects:** None - only adds error
checking ✅ **No architectural changes:** Pure validation enhancement ✅
**Touches critical subsystem:** Graphics driver reliability/security ✅
**Minimal regression risk:** Only adds stricter validation ## Comparison
with Historical Patterns Looking at the provided similar commits (all
marked "NO"), they were architectural refactoring changes that: - Split
functions (`ras_eeprom_init into init and check functions`) - Added new
infrastructure (`Hook EEPROM table to RAS`) - Wrapped function calls
(`RAS xfer to read/write`) - Added entirely new features (`Add RAS
EEPROM table`) In contrast, this commit is a **security hardening fix**
that adds essential input validation to prevent memory exhaustion
attacks. ## Impact Assessment **Risk if NOT backported:** - Systems with
AMD GPUs vulnerable to DoS attacks via corrupted EEPROM data - Potential
for system instability when corrupted data triggers massive allocations
- Security exposure in enterprise/server environments using AMD GPUs
**Risk if backported:** - Minimal: Only adds validation, existing
correct data will continue to work - Possible rejection of previously
accepted (but actually corrupted) EEPROM data - this is desired behavior
This fix addresses a clear security vulnerability with minimal code
changes and should definitely be backported to stable trees to protect
users from memory exhaustion attacks via corrupted RAS EEPROM headers.

 .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c    | 22 ++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 0ea7cfaf3587d..e979a6086178c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -1392,17 +1392,33 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control)
 
 	__decode_table_header_from_buf(hdr, buf);
 
-	if (hdr->version >= RAS_TABLE_VER_V2_1) {
+	switch (hdr->version) {
+	case RAS_TABLE_VER_V2_1:
+	case RAS_TABLE_VER_V3:
 		control->ras_num_recs = RAS_NUM_RECS_V2_1(hdr);
 		control->ras_record_offset = RAS_RECORD_START_V2_1;
 		control->ras_max_record_count = RAS_MAX_RECORD_COUNT_V2_1;
-	} else {
+		break;
+	case RAS_TABLE_VER_V1:
 		control->ras_num_recs = RAS_NUM_RECS(hdr);
 		control->ras_record_offset = RAS_RECORD_START;
 		control->ras_max_record_count = RAS_MAX_RECORD_COUNT;
+		break;
+	default:
+		dev_err(adev->dev,
+			"RAS header invalid, unsupported version: %u",
+			hdr->version);
+		return -EINVAL;
 	}
-	control->ras_fri = RAS_OFFSET_TO_INDEX(control, hdr->first_rec_offset);
 
+	if (control->ras_num_recs > control->ras_max_record_count) {
+		dev_err(adev->dev,
+			"RAS header invalid, records in header: %u max allowed :%u",
+			control->ras_num_recs, control->ras_max_record_count);
+		return -EINVAL;
+	}
+
+	control->ras_fri = RAS_OFFSET_TO_INDEX(control, hdr->first_rec_offset);
 	control->ras_num_mca_recs = 0;
 	control->ras_num_pa_recs = 0;
 	return 0;
-- 
2.39.5




More information about the Linux-mediatek mailing list