[PATCH AUTOSEL 6.15 032/110] drm/amdgpu: Add basic validation for RAS header
Sasha Levin
sashal at kernel.org
Sun Jun 1 16:23:14 PDT 2025
From: Lijo Lazar <lijo.lazar at amd.com>
[ Upstream commit 5df0d6addb7e9b6f71f7162d1253762a5be9138e ]
If RAS header read from EEPROM is corrupted, it could result in trying
to allocate huge memory for reading the records. Add some validation to
header fields.
Signed-off-by: Lijo Lazar <lijo.lazar at amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>
Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
Signed-off-by: Sasha Levin <sashal at kernel.org>
---
Based on my analysis, here is my assessment: **YES** This commit should
be backported to stable kernel trees. Here's my extensive analysis: ##
Security Vulnerability Analysis **Critical Memory Allocation
Vulnerability:** The commit fixes a serious security vulnerability where
corrupted RAS (Reliability, Availability, and Serviceability) EEPROM
header data could cause huge memory allocations. In
`/home/sasha/linux/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:3011`, the
code performs: ```c bps = kcalloc(control->ras_num_recs, sizeof(*bps),
GFP_KERNEL); ``` **Before the fix:** If `control->ras_num_recs` contains
a corrupted large value (e.g., 0xFFFFFFFF), this would attempt to
allocate `0xFFFFFFFF 0001-Fix-Clippy-warnings.patch 0002-Enhance-
inference-prompt-to-utilize-CVEKERNELDIR-whe.patch 0003-Update-to-
latest-version-of-clap.patch Cargo.lock Cargo.toml LICENSE README.md
analyze_merge_commit.sh dpp_rcg_backport_analysis.md
drm_bridge_analysis.txt drm_imagination_register_update_analysis.md
drm_mediatek_mtk_dpi_refactoring_analysis.md io_uring_analysis.txt
ksmbd_analysis.txt merge_commit_analysis.txt model prompt src target
test_gpio_cleanup.txt test_patch.txt verisilicon_av1_4k_analysis.md 24
bytes` = ~96GB of memory, likely causing: 1. System memory exhaustion 2.
Denial of service 3. Potential system crash/instability ## Code Changes
Analysis **1. Version Validation Enhancement:** The fix replaces a
simple `if/else` with a robust `switch` statement: ```c // Before: Only
checked for version >= V2_1 if (hdr->version >= RAS_TABLE_VER_V2_1) { //
After: Explicit validation of known versions switch (hdr->version) {
case RAS_TABLE_VER_V2_1: case RAS_TABLE_VER_V3: // Future-proofing //
V2.1+ handling break; case RAS_TABLE_VER_V1: // V1 handling break;
default: dev_err(adev->dev, "RAS header invalid, unsupported version:
%u", hdr->version); return -EINVAL; } ``` **2. Record Count Bounds
Checking:** Critical addition of bounds validation: ```c if
(control->ras_num_recs > control->ras_max_record_count) {
dev_err(adev->dev, "RAS header invalid, records in header: %u max
allowed :%u", control->ras_num_recs, control->ras_max_record_count);
return -EINVAL; } ``` This prevents the memory allocation attack by
ensuring `ras_num_recs` cannot exceed reasonable bounds. ## Stable Tree
Criteria Assessment ✅ **Fixes important security bug:** Prevents DoS via
memory exhaustion ✅ **Small and contained:** Only adds validation logic,
no functional changes ✅ **Clear side effects:** None - only adds error
checking ✅ **No architectural changes:** Pure validation enhancement ✅
**Touches critical subsystem:** Graphics driver reliability/security ✅
**Minimal regression risk:** Only adds stricter validation ## Comparison
with Historical Patterns Looking at the provided similar commits (all
marked "NO"), they were architectural refactoring changes that: - Split
functions (`ras_eeprom_init into init and check functions`) - Added new
infrastructure (`Hook EEPROM table to RAS`) - Wrapped function calls
(`RAS xfer to read/write`) - Added entirely new features (`Add RAS
EEPROM table`) In contrast, this commit is a **security hardening fix**
that adds essential input validation to prevent memory exhaustion
attacks. ## Impact Assessment **Risk if NOT backported:** - Systems with
AMD GPUs vulnerable to DoS attacks via corrupted EEPROM data - Potential
for system instability when corrupted data triggers massive allocations
- Security exposure in enterprise/server environments using AMD GPUs
**Risk if backported:** - Minimal: Only adds validation, existing
correct data will continue to work - Possible rejection of previously
accepted (but actually corrupted) EEPROM data - this is desired behavior
This fix addresses a clear security vulnerability with minimal code
changes and should definitely be backported to stable trees to protect
users from memory exhaustion attacks via corrupted RAS EEPROM headers.
.../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 22 ++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 0ea7cfaf3587d..e979a6086178c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -1392,17 +1392,33 @@ int amdgpu_ras_eeprom_init(struct amdgpu_ras_eeprom_control *control)
__decode_table_header_from_buf(hdr, buf);
- if (hdr->version >= RAS_TABLE_VER_V2_1) {
+ switch (hdr->version) {
+ case RAS_TABLE_VER_V2_1:
+ case RAS_TABLE_VER_V3:
control->ras_num_recs = RAS_NUM_RECS_V2_1(hdr);
control->ras_record_offset = RAS_RECORD_START_V2_1;
control->ras_max_record_count = RAS_MAX_RECORD_COUNT_V2_1;
- } else {
+ break;
+ case RAS_TABLE_VER_V1:
control->ras_num_recs = RAS_NUM_RECS(hdr);
control->ras_record_offset = RAS_RECORD_START;
control->ras_max_record_count = RAS_MAX_RECORD_COUNT;
+ break;
+ default:
+ dev_err(adev->dev,
+ "RAS header invalid, unsupported version: %u",
+ hdr->version);
+ return -EINVAL;
}
- control->ras_fri = RAS_OFFSET_TO_INDEX(control, hdr->first_rec_offset);
+ if (control->ras_num_recs > control->ras_max_record_count) {
+ dev_err(adev->dev,
+ "RAS header invalid, records in header: %u max allowed :%u",
+ control->ras_num_recs, control->ras_max_record_count);
+ return -EINVAL;
+ }
+
+ control->ras_fri = RAS_OFFSET_TO_INDEX(control, hdr->first_rec_offset);
control->ras_num_mca_recs = 0;
control->ras_num_pa_recs = 0;
return 0;
--
2.39.5
More information about the Linux-mediatek
mailing list