[PATCH v2 2/2] restrict /dev/mem to idle io memory ranges
dan.j.williams at intel.com
Tue Nov 24 16:34:19 PST 2015
On Tue, Nov 24, 2015 at 2:25 PM, Andrew Morton
<akpm at linux-foundation.org> wrote:
> On Mon, 23 Nov 2015 16:06:04 -0800 Dan Williams <dan.j.williams at intel.com> wrote:
>> This effectively promotes IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE
>> semantics by default. If userspace really believes it is safe to access
>> the memory region it can also perform the extra step of disabling an
>> active driver. This protects device address ranges with read side
>> effects and otherwise directs userspace to use the driver.
> I don't think I'm sufficiently understanding what this is all needed
> for, sorry. A better changelog would help: what's wrong with the
> current code, how you propose it be changed, how the kernel's
> externally-visible behaviour is altered, etc.
I should have duplicated the Kconfig description for IO_STRICT_DEVMEM
in the changelog, but the justification is simply that if the kernel
has a driver busily using a memory range, userspace needs to assert it
knows it is safe to access that range by disabling the driver. This
makes the kernel safer by default.
> Please pay particular attention to the back-compatibility issues which
> will be encountered when people enable these options.
It certainly diminishes debug capabilities, mmap of sysfs pci
resources will also fail while a driver is active. The only general
purpose application I know that uses /dev/mem is dosemu. It should
continue to work fine as x86 "devmem_is_allowed()" permits access from
0-to-1MB by default. The other stated user of /dev/mem legacy X
drivers. With the prevalence of kernel modesetting in graphics
drivers I don't know how much of a concern this is anymore.
> Perhaps when all that material is described, I'll understand why the
> heck we're doing this with a build-time switch rather than a runtime
We have the "iomem=" kernel parameter. I think it makes sense to have
that setting be configurable at runtime to augment this build time
>> Persistent memory presents a large "mistake surface" to /dev/mem as now
>> accidental writes can corrupt a filesystem.
> Is that the motivation? root can come in and accidentally alter
> persistent memory contents? If so,
> - why do we care? There are all sorts of ways in which root can muck
> up the persistent memory, starting with dd(1). What's special about
dd through /dev/pmem and the driver will do all the proper flushing
and syncing to make the writes durable on media. /dev/mem knows none
of those semantics. /dev/pmem as a block device responds to O_EXCL
and prevents other attempts to open the device.
> - why is the patch mucking with access to PCI and BIOS space? Is the
> persistent memory even mappable in those regions? Or is the concern
> that userspace can access control registers associated with the
> persistent memory? What is the problem scenario?
It seems to me that letting /dev/mem do arbitrary access to any region
of memory is a dangerous capability for a production environment.
Drivers assume that request_mem_region() tells other parts of the
kernel to not touch their memory. Having the option to extend that
protection to /dev/mem by default seemed a reasonable idea.
Of course, all of this assumes that you think it is worthwhile to have
some protections and safety measures even for root.
> IOW, a very good description of the problem-being-solved would help out
> a lot here...
I'll fold the eventual result of this discussion into the changelog if
I can convince you it's worth moving forward.
I also have the option of just tagging the pmem regions as
IORESOURCE_EXCLUSIVE, but I decided against that because I think our
current definition of STRICT_DEVMEM leaves a big hole if the goal is
"/dev/mem access is safe by default".
More information about the linux-arm-kernel