Discussion:
[PATCH] udevadm-info: Don't access sysfs entries backing device I/O port space
Myron Stowe
2013-03-16 21:35:12 UTC
Permalink
I've been working on identifying the root cause of an issue exposed by
'udevadm' that was first exposed on the linux-pci mail list [1] and
believe that there is now enough of an understanding to propose a fix.

What was originally witnessed was the platform hanging after "udevadm info
--attribute-walk --path=/sys/devices/pci0000:00/<...>/block/sda" is ran.
Xiangliang was able to isolate the failure to accesses involving a Marvell
9125 device's I/O BARs, or more specifically, accesses to the I/O port
space backing the device's I/O BARs (a.k.a. the device's I/O port
resources, or regions). With this knowledge he was able to reproduce the
hang targeting the device's sysfs 'resource<N>' entries, where N
represents an I/O BARs, with "cat /sys/devices/<...>/resource<N>".

In my research, looking for possible solutions, I noticed that kernel
commit 8633328 introduced sysfs based reading and writing of I/O port
related 'resource<N>' entries as part of adding virtualization based
device assignment functionality [2]. Note that these regions directly map
to the device's control and status registers [3].

Putting together these pieces of information we now understand that:
o udevadm based attribute walking initiates read accesses of all the
entries in a device's sysfs directory [4],
o sysfs 'resource<N>' entries correspond to the device's internal
status and control registers used for driving the device,
o If the 'resource<N>' entry corresponds to a device's I/O BAR, the
device's control and status registers are directly accessible by
userspace.

Allowing userspace access to a device's registers introduces potential
simultaneous interaction with the device by a second, competing, entity;
There is the device's driver, which believes it exclusively owns the
device, and an unknown, potential second entity, which can effect control
and status changes to the device asynchronously.

Device status and control registers being accessed from an entity that has
no idea what is being read or written is just asking for trouble. Even
just reading can have consequences as the register may be a "read once to
clear" or some similar type. I think we have just been lucky, or
blissfully ignorant, concerning problems that may have, and still could
be, occurring due to this situation.


There is an aspect at play here that I still do not understand (likely
something obvious that I'm just not seeing). The sysfs read routine for
accessing I/O port 'resource<N>' entries only supports 1, 2, and 4 byte
reads (which respectively correspond to inb/outb, inw/outw, and inl/outl
I/O port accessors). When "udevadm ..." executes, the udev internals
attempt reads of 4K byte chunks.

"udevadm info --attribute-walk --path=<pci_device_path>"

print_device_chain()
print_all_attributes()
...
udev_device_get_sysattr_value()
char value[4096];
...
size = read(fd, value, sizeof(value));
...

-- ^ userspace ^ -- v kernel v --

pci_read_resource_io(..., count) # sysfs read setup in pci_create_attr()
pci_resource_io(..., count, ...)
...
if (port + count - 1 > pci_resource_end(pdev, i))
return -EINVAL;
...

What I don't understand is how the device's I/O port space is successfully
getting read. It looks to me like 'pci_resource_io()' would fail the
access size check and return '-EINVAL' having never attempted the read's
access to I/O port space causing the system to hang.

I'm keep looking into this but I do *not* have access to a platform with a
Marvell 9125 device.


Reference(s)/Foot Note(s):
[1] https://lkml.org/lkml/2013/3/7/242
[2] Note that due to the implementation specifics only the 'resource<N>'
entries representing I/O BARs can be read or written via sysfs.
Sysfs' 'resource<N>' entries representing MMIO do not have sysfs
based read/write routines as only mmap'ing of these entries is
exposed (./drivers/pci/pci-sysfs.c::pci_create_attr()).
[3] The kernel's sysfs documentation states: "Attributes should be ASCII
text files..." (./Documentation/filesystems/sysfs.txt). I wonder if
this is just out-of-date infomation as sysfs obviously supports
creating binary files (./fs/sysfs/bin.c::sysfs_create_bin_file()).
[4] Note that udevadm-info does skip a specifically named set of entries
(./src/udevadm-info.c::skip_attribute()).
---

Myron Stowe (1):
udevadm-info: Don't access sysfs 'resource<N>' files


src/udevadm-info.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
--
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Myron Stowe
2013-03-16 21:35:19 UTC
Permalink
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.

Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.

Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.

Reported-by: Xiangliang Yu <***@marvell.com>
Signed-off-by: Myron Stowe <***@redhat.com>
---

src/udevadm-info.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/src/udevadm-info.c b/src/udevadm-info.c
index ee9b59f..298acb5 100644
--- a/src/udevadm-info.c
+++ b/src/udevadm-info.c
@@ -37,13 +37,18 @@ static bool skip_attribute(const char *name)
"uevent",
"dev",
"modalias",
- "resource",
"driver",
"subsystem",
"module",
};
unsigned int i;

+ /*
+ * Skip any sysfs 'resource' entries, including 'resource<N>' entries
+ * that correspond to a device's I/O Port or MMIO space backed BARs.
+ */
+ if (strncmp((const char *)name, "resource", sizeof("resource")-1) == 0)
+ return true;
for (i = 0; i < ARRAY_SIZE(skip); i++)
if (strcmp(name, skip[i]) == 0)
return true;

--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Greg KH
2013-03-16 22:11:59 UTC
Permalink
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)

And pciutils?

You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.

If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas
2013-03-16 22:55:07 UTC
Permalink
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
I'm not sure that "udevadm info" (or bash) reading device registers is
a good idea because we don't know what the device is, and we don't
have any idea what the side effects of reading its registers will be.
Just to be clear, this is about device-specific I/O port registers,
not config space, so we can't expect any sort of consistency.

We could put a quirk in the kernel for this device (obviously the
issue is independent of whether the driver is loaded), but no doubt
other devices with I/O BARs will have access size restrictions, side
effects, or other issues. Adding quirks for them feels like a
never-ending job.

It might have been a mistake to put the resourceN files in sysfs in
the first place, or to make them read/writable, because users expect
sysfs files to contain ASCII. For memory BARs, resourceN only allows
mmap, not read/write, so at least we side-step similar issues there.

Bjorn
Myron Stowe
2013-03-16 23:50:53 UTC
Permalink
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
Yes :P , you raise a very good point, there are a lot of way a user can
poke around in those BARs. However, there is a difference between
shooting yourself in the foot and getting what you deserve versus
unknowingly executing a common command such as udevadm and having the
system hang.
Post by Greg KH
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
Xiangliang initially proposed a patch within the PCI core. Ignoring the
specific issue with the proposal which I pointed out in the
https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like
the right place to effect a change either as PCI's core isn't concerned
with the contents or access limitations of those regions, those are
issues that the driver concerns itself with.

So things seem to be gravitating towards the driver. I'm fairly
ignorant of this area but as Robert succinctly pointed out in the
originating thread - the AHCI driver only uses the device's MMIO region.
The I/O related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the observance that
userspace accesses such as udevadm, and others like you additionally
point out, do not filter through the device's driver for seems to
suggest that changes to the driver will not help here either.

That said, I was attempting to point out an interesting problem and get
the conversation started towards coming up with some type a solution.
Let's continue the conversation and see where things go.

Thanks,
Myron
Post by Greg KH
greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Greg KH
2013-03-17 01:03:17 UTC
Permalink
Post by Myron Stowe
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
Yes :P , you raise a very good point, there are a lot of way a user can
poke around in those BARs. However, there is a difference between
shooting yourself in the foot and getting what you deserve versus
unknowingly executing a common command such as udevadm and having the
system hang.
Post by Greg KH
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
Xiangliang initially proposed a patch within the PCI core. Ignoring the
specific issue with the proposal which I pointed out in the
https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like
the right place to effect a change either as PCI's core isn't concerned
with the contents or access limitations of those regions, those are
issues that the driver concerns itself with.
So things seem to be gravitating towards the driver. I'm fairly
ignorant of this area but as Robert succinctly pointed out in the
originating thread - the AHCI driver only uses the device's MMIO region.
The I/O related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the observance that
userspace accesses such as udevadm, and others like you additionally
point out, do not filter through the device's driver for seems to
suggest that changes to the driver will not help here either.
A PCI quirk should handle this properly, right? Why not do that? Worse
thing, the quirk could just not expose these sysfs files for this
device, which would solve all userspace program issues, right?

thanks,

greg k-h
Alex Williamson
2013-03-17 04:11:22 UTC
Permalink
Post by Greg KH
Post by Myron Stowe
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
Yes :P , you raise a very good point, there are a lot of way a user can
poke around in those BARs. However, there is a difference between
shooting yourself in the foot and getting what you deserve versus
unknowingly executing a common command such as udevadm and having the
system hang.
Post by Greg KH
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
Xiangliang initially proposed a patch within the PCI core. Ignoring the
specific issue with the proposal which I pointed out in the
https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like
the right place to effect a change either as PCI's core isn't concerned
with the contents or access limitations of those regions, those are
issues that the driver concerns itself with.
So things seem to be gravitating towards the driver. I'm fairly
ignorant of this area but as Robert succinctly pointed out in the
originating thread - the AHCI driver only uses the device's MMIO region.
The I/O related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the observance that
userspace accesses such as udevadm, and others like you additionally
point out, do not filter through the device's driver for seems to
suggest that changes to the driver will not help here either.
A PCI quirk should handle this properly, right? Why not do that? Worse
thing, the quirk could just not expose these sysfs files for this
device, which would solve all userspace program issues, right?
Not exactly. I/O port access through pci-sysfs was added for userspace
programs, specifically qemu-kvm device assignment. We use the I/O port
resource# files to access device owned I/O port registers using file
permissions rather than global permissions such as iopl/ioperm. File
permissions also prevent random users from accessing device registers
through these files, but of course can't stop a privileged app that
chooses to ignore the purpose of these files. A quirk would therefore
remove a file that actually has a useful purpose for one app just so
another app that has no particular reason for dumping the contents can
run unabated. Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Greg KH
2013-03-17 05:36:11 UTC
Permalink
Post by Alex Williamson
Post by Greg KH
Post by Myron Stowe
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
Yes :P , you raise a very good point, there are a lot of way a user can
poke around in those BARs. However, there is a difference between
shooting yourself in the foot and getting what you deserve versus
unknowingly executing a common command such as udevadm and having the
system hang.
Post by Greg KH
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
Xiangliang initially proposed a patch within the PCI core. Ignoring the
specific issue with the proposal which I pointed out in the
https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like
the right place to effect a change either as PCI's core isn't concerned
with the contents or access limitations of those regions, those are
issues that the driver concerns itself with.
So things seem to be gravitating towards the driver. I'm fairly
ignorant of this area but as Robert succinctly pointed out in the
originating thread - the AHCI driver only uses the device's MMIO region.
The I/O related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the observance that
userspace accesses such as udevadm, and others like you additionally
point out, do not filter through the device's driver for seems to
suggest that changes to the driver will not help here either.
A PCI quirk should handle this properly, right? Why not do that? Worse
thing, the quirk could just not expose these sysfs files for this
device, which would solve all userspace program issues, right?
Not exactly. I/O port access through pci-sysfs was added for userspace
programs, specifically qemu-kvm device assignment. We use the I/O port
resource# files to access device owned I/O port registers using file
permissions rather than global permissions such as iopl/ioperm. File
permissions also prevent random users from accessing device registers
through these files, but of course can't stop a privileged app that
chooses to ignore the purpose of these files. A quirk would therefore
remove a file that actually has a useful purpose for one app just so
another app that has no particular reason for dumping the contents can
run unabated. Thanks,
The quirk would only be for this one specific device, which obviously
can't handle this type of access, so why would you want the sysfs files
even present for it at all?

greg k-h
Alex Williamson
2013-03-17 13:38:23 UTC
Permalink
Post by Greg KH
Post by Alex Williamson
Post by Greg KH
Post by Myron Stowe
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
Yes :P , you raise a very good point, there are a lot of way a user can
poke around in those BARs. However, there is a difference between
shooting yourself in the foot and getting what you deserve versus
unknowingly executing a common command such as udevadm and having the
system hang.
Post by Greg KH
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
Xiangliang initially proposed a patch within the PCI core. Ignoring the
specific issue with the proposal which I pointed out in the
https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like
the right place to effect a change either as PCI's core isn't concerned
with the contents or access limitations of those regions, those are
issues that the driver concerns itself with.
So things seem to be gravitating towards the driver. I'm fairly
ignorant of this area but as Robert succinctly pointed out in the
originating thread - the AHCI driver only uses the device's MMIO region.
The I/O related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the observance that
userspace accesses such as udevadm, and others like you additionally
point out, do not filter through the device's driver for seems to
suggest that changes to the driver will not help here either.
A PCI quirk should handle this properly, right? Why not do that? Worse
thing, the quirk could just not expose these sysfs files for this
device, which would solve all userspace program issues, right?
Not exactly. I/O port access through pci-sysfs was added for userspace
programs, specifically qemu-kvm device assignment. We use the I/O port
resource# files to access device owned I/O port registers using file
permissions rather than global permissions such as iopl/ioperm. File
permissions also prevent random users from accessing device registers
through these files, but of course can't stop a privileged app that
chooses to ignore the purpose of these files. A quirk would therefore
remove a file that actually has a useful purpose for one app just so
another app that has no particular reason for dumping the contents can
run unabated. Thanks,
The quirk would only be for this one specific device, which obviously
can't handle this type of access, so why would you want the sysfs files
even present for it at all?
I'm assuming that the device only breaks because udevadm is dumping the
full I/O port register space of the device and that if an actual driver
was interacting with it through this interface that it would work. Who
knows how many devices will have read side-effects by udevadm blindly
dumping these files. Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Kay Sievers
2013-03-17 14:00:59 UTC
Permalink
On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
Post by Alex Williamson
I'm assuming that the device only breaks because udevadm is dumping the
full I/O port register space of the device and that if an actual driver
was interacting with it through this interface that it would work. Who
knows how many devices will have read side-effects by udevadm blindly
dumping these files. Thanks,
Sysfs is a too public interface to export things there which make
devices/driver choke on a simple read() of an attribute.

This is nothing specific to udevadm, any tool can do that. Udevadm
will never read any of the files during normal operation. The admin
explicitly asked udevadm with a specific command to dump all the stuff
the device offers.

The kernel driver needs to be fixed to allow that, in the worst case,
the attributes not exported at all. People should take more care what
they export in /sys, it's not a hidden and private ioctl what's
exported there, stuff is very visible and will be looked at.

Telling userspace not to use specific stuff in /sys I would not expect
to work as a strategy; there is too much weird stuff out there that
will always try to do that ...

Thanks,
Kay
Myron Stowe
2013-03-17 14:20:28 UTC
Permalink
Post by Kay Sievers
On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
Post by Alex Williamson
I'm assuming that the device only breaks because udevadm is dumping the
full I/O port register space of the device and that if an actual driver
was interacting with it through this interface that it would work. Who
knows how many devices will have read side-effects by udevadm blindly
dumping these files. Thanks,
Sysfs is a too public interface to export things there which make
devices/driver choke on a simple read() of an attribute.
This is nothing specific to udevadm, any tool can do that. Udevadm
will never read any of the files during normal operation. The admin
explicitly asked udevadm with a specific command to dump all the stuff
the device offers.
The kernel driver needs to be fixed to allow that, in the worst case,
the attributes not exported at all. People should take more care what
they export in /sys, it's not a hidden and private ioctl what's
exported there, stuff is very visible and will be looked at.
Telling userspace not to use specific stuff in /sys I would not expect
to work as a strategy; there is too much weird stuff out there that
will always try to do that ...
Kay - could you comment on Foot Note 3 in
https://lkml.org/lkml/2013/3/16/168

With respect to 'udev', you are working on the assumption that all files
in sysfs must be readable with no consequences which may be implied by
the Documentation's sysfs.txt file's mentioning ASCII. If we are to
interpret that as strictly as you seem to want to then why is there
sysfs support for creating binary files?

Myron
Post by Kay Sievers
Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Kay Sievers
2013-03-17 14:29:39 UTC
Permalink
Post by Myron Stowe
Post by Kay Sievers
On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
Post by Alex Williamson
I'm assuming that the device only breaks because udevadm is dumping the
full I/O port register space of the device and that if an actual driver
was interacting with it through this interface that it would work. Who
knows how many devices will have read side-effects by udevadm blindly
dumping these files. Thanks,
Sysfs is a too public interface to export things there which make
devices/driver choke on a simple read() of an attribute.
This is nothing specific to udevadm, any tool can do that. Udevadm
will never read any of the files during normal operation. The admin
explicitly asked udevadm with a specific command to dump all the stuff
the device offers.
The kernel driver needs to be fixed to allow that, in the worst case,
the attributes not exported at all. People should take more care what
they export in /sys, it's not a hidden and private ioctl what's
exported there, stuff is very visible and will be looked at.
Telling userspace not to use specific stuff in /sys I would not expect
to work as a strategy; there is too much weird stuff out there that
will always try to do that ...
Kay - could you comment on Foot Note 3 in
https://lkml.org/lkml/2013/3/16/168
With respect to 'udev', you are working on the assumption that all files
in sysfs must be readable with no consequences which may be implied by
the Documentation's sysfs.txt file's mentioning ASCII. If we are to
interpret that as strictly as you seem to want to then why is there
sysfs support for creating binary files?
They cannot be distinguished from outside, so there is nothing I know
that could make a difference to userspace tools.

Tools -- no matter how useful they are not not, it's that they do that
for many years already -- need to be able to read() the stuff in
there, without causing any damage to the system.

Kay
Myron Stowe
2013-03-17 14:36:57 UTC
Permalink
Post by Kay Sievers
Post by Myron Stowe
Post by Kay Sievers
On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
Post by Alex Williamson
I'm assuming that the device only breaks because udevadm is dumping the
full I/O port register space of the device and that if an actual driver
was interacting with it through this interface that it would work. Who
knows how many devices will have read side-effects by udevadm blindly
dumping these files. Thanks,
Sysfs is a too public interface to export things there which make
devices/driver choke on a simple read() of an attribute.
This is nothing specific to udevadm, any tool can do that. Udevadm
will never read any of the files during normal operation. The admin
explicitly asked udevadm with a specific command to dump all the stuff
the device offers.
The kernel driver needs to be fixed to allow that, in the worst case,
the attributes not exported at all. People should take more care what
they export in /sys, it's not a hidden and private ioctl what's
exported there, stuff is very visible and will be looked at.
Telling userspace not to use specific stuff in /sys I would not expect
to work as a strategy; there is too much weird stuff out there that
will always try to do that ...
Kay - could you comment on Foot Note 3 in
https://lkml.org/lkml/2013/3/16/168
With respect to 'udev', you are working on the assumption that all files
in sysfs must be readable with no consequences which may be implied by
the Documentation's sysfs.txt file's mentioning ASCII. If we are to
interpret that as strictly as you seem to want to then why is there
sysfs support for creating binary files?
They cannot be distinguished from outside, so there is nothing I know
that could make a difference to userspace tools.
Agreed
Post by Kay Sievers
Tools -- no matter how useful they are not not, it's that they do that
for many years already -- need to be able to read() the stuff in
there, without causing any damage to the system.
So then, why are certain sysfs files skipped in udevadm-info's parsing
(./src/udevadm-info.c::skip_attribute())?
Post by Kay Sievers
Kay
Kay Sievers
2013-03-17 14:43:42 UTC
Permalink
Post by Myron Stowe
Post by Kay Sievers
Post by Myron Stowe
Post by Kay Sievers
On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
Post by Alex Williamson
I'm assuming that the device only breaks because udevadm is dumping the
full I/O port register space of the device and that if an actual driver
was interacting with it through this interface that it would work. Who
knows how many devices will have read side-effects by udevadm blindly
dumping these files. Thanks,
Sysfs is a too public interface to export things there which make
devices/driver choke on a simple read() of an attribute.
This is nothing specific to udevadm, any tool can do that. Udevadm
will never read any of the files during normal operation. The admin
explicitly asked udevadm with a specific command to dump all the stuff
the device offers.
The kernel driver needs to be fixed to allow that, in the worst case,
the attributes not exported at all. People should take more care what
they export in /sys, it's not a hidden and private ioctl what's
exported there, stuff is very visible and will be looked at.
Telling userspace not to use specific stuff in /sys I would not expect
to work as a strategy; there is too much weird stuff out there that
will always try to do that ...
Kay - could you comment on Foot Note 3 in
https://lkml.org/lkml/2013/3/16/168
With respect to 'udev', you are working on the assumption that all files
in sysfs must be readable with no consequences which may be implied by
the Documentation's sysfs.txt file's mentioning ASCII. If we are to
interpret that as strictly as you seem to want to then why is there
sysfs support for creating binary files?
They cannot be distinguished from outside, so there is nothing I know
that could make a difference to userspace tools.
Agreed
Post by Kay Sievers
Tools -- no matter how useful they are not not, it's that they do that
for many years already -- need to be able to read() the stuff in
there, without causing any damage to the system.
So then, why are certain sysfs files skipped in udevadm-info's parsing
(./src/udevadm-info.c::skip_attribute())?
Because they are not useful to use in udev rules, or are just not
recommended to use in rules because they break other assumptions and
would encode specific settings, which can rightfully change at
runtime, into rules.

The list is in no way a list to ensure a system/driver/device is not
choking on read().

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Alex Williamson
2013-03-18 16:24:40 UTC
Permalink
Post by Kay Sievers
On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
Post by Alex Williamson
I'm assuming that the device only breaks because udevadm is dumping the
full I/O port register space of the device and that if an actual driver
was interacting with it through this interface that it would work. Who
knows how many devices will have read side-effects by udevadm blindly
dumping these files. Thanks,
Sysfs is a too public interface to export things there which make
devices/driver choke on a simple read() of an attribute.
That's why the default permissions for the file do not allow users to
read it. I wish we could do something as clever as the MMIO resource
files, but I/O port spaces don't allow mmap for the predominant
architecture. Eventually VFIO is meant to replace this access and does
move device register access behind ioctls, but for now legacy KVM device
assignment relies on these files and so might some UIO drivers.
Post by Kay Sievers
This is nothing specific to udevadm, any tool can do that. Udevadm
will never read any of the files during normal operation. The admin
explicitly asked udevadm with a specific command to dump all the stuff
the device offers.
Isn't it possible udevadm could drop privileges or filter out non-world
readable files?
Post by Kay Sievers
The kernel driver needs to be fixed to allow that, in the worst case,
the attributes not exported at all. People should take more care what
they export in /sys, it's not a hidden and private ioctl what's
exported there, stuff is very visible and will be looked at.
File permissions...
Post by Kay Sievers
Telling userspace not to use specific stuff in /sys I would not expect
to work as a strategy; there is too much weird stuff out there that
will always try to do that ...
I agree, the kernel needs to protect itself from malicious apps, but if
you run a malicious app with admin access, how much can/should we do?
If we're going to ignore file permissions, why limit ourselves to
read(), should we make everything safe against write() as well? Thanks,

Alex
Greg KH
2013-03-18 16:41:26 UTC
Permalink
Post by Alex Williamson
Post by Kay Sievers
On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
Post by Alex Williamson
I'm assuming that the device only breaks because udevadm is dumping the
full I/O port register space of the device and that if an actual driver
was interacting with it through this interface that it would work. Who
knows how many devices will have read side-effects by udevadm blindly
dumping these files. Thanks,
Sysfs is a too public interface to export things there which make
devices/driver choke on a simple read() of an attribute.
That's why the default permissions for the file do not allow users to
read it. I wish we could do something as clever as the MMIO resource
files, but I/O port spaces don't allow mmap for the predominant
architecture. Eventually VFIO is meant to replace this access and does
move device register access behind ioctls, but for now legacy KVM device
assignment relies on these files and so might some UIO drivers.
Post by Kay Sievers
This is nothing specific to udevadm, any tool can do that. Udevadm
will never read any of the files during normal operation. The admin
explicitly asked udevadm with a specific command to dump all the stuff
the device offers.
Isn't it possible udevadm could drop privileges or filter out non-world
readable files?
And you are going to do the same thing for bash? All other shells?

Come on, the user specifically asked to read this file, as root, and
udev did so. Just like bash would.

Please fix the kernel if this is a real problem, you aren't going to be
able to patch all userspace programs, that's not the proper solution
here.

thanks,

greg k-h
Alex Williamson
2013-03-18 16:51:03 UTC
Permalink
Post by Greg KH
Post by Alex Williamson
Post by Kay Sievers
On Sun, Mar 17, 2013 at 2:38 PM, Alex Williamson
Post by Alex Williamson
I'm assuming that the device only breaks because udevadm is dumping the
full I/O port register space of the device and that if an actual driver
was interacting with it through this interface that it would work. Who
knows how many devices will have read side-effects by udevadm blindly
dumping these files. Thanks,
Sysfs is a too public interface to export things there which make
devices/driver choke on a simple read() of an attribute.
That's why the default permissions for the file do not allow users to
read it. I wish we could do something as clever as the MMIO resource
files, but I/O port spaces don't allow mmap for the predominant
architecture. Eventually VFIO is meant to replace this access and does
move device register access behind ioctls, but for now legacy KVM device
assignment relies on these files and so might some UIO drivers.
Post by Kay Sievers
This is nothing specific to udevadm, any tool can do that. Udevadm
will never read any of the files during normal operation. The admin
explicitly asked udevadm with a specific command to dump all the stuff
the device offers.
Isn't it possible udevadm could drop privileges or filter out non-world
readable files?
And you are going to do the same thing for bash? All other shells?
Come on, the user specifically asked to read this file, as root, and
udev did so. Just like bash would.
Please fix the kernel if this is a real problem, you aren't going to be
able to patch all userspace programs, that's not the proper solution
here.
At least for KVM the kernel fix is the addition of the vfio driver which
gives us a non-sysfs way to do this. If this problem was found a few
years later and we were ready to make the switch I'd support just
removing these resource files. In the meantime we have userspace that
depends on this interface, so I'm open to suggestions how to fix it.

If we want to blacklist this specific device, that's fine, but as others
have pointed out it's really a class problem. Perhaps we report 1 byte
extra for the file length where EOF-1 is an enable byte? Is there
anything else in file ops that we could use to make it slightly more
complicated than open(), read() to access the device? Thanks,

Alex
Bjørn Mork
2013-03-18 17:20:46 UTC
Permalink
At least for KVM the kernel fix is the addition of the vfio driver wh=
ich
gives us a non-sysfs way to do this. If this problem was found a few
years later and we were ready to make the switch I'd support just
removing these resource files. In the meantime we have userspace tha=
t
depends on this interface, so I'm open to suggestions how to fix it.
I am puzzled by a couple of things in this discussion:

1) do you seriously mean that a userspace application (any, not just
udevadm or qemu or whatever) should be able to read and write these
registers while the device is owned by a driver? How is that ever
going to work?

2) is it really so that a device can be so fundamentally screwed up by
reading some registers, that a later driver probe cannot properly
reinitialize it?

I would have thought that the solution to all this was to return -EINVA=
L
on any attemt to read or write these files while a driver is bound to
the device. If userspace is going to use the API, then the application
better unbind any driver first.

Or? Am I missing something here?
If we want to blacklist this specific device, that's fine, but as oth=
ers
have pointed out it's really a class problem. Perhaps we report 1 by=
te
extra for the file length where EOF-1 is an enable byte? Is there
anything else in file ops that we could use to make it slightly more
complicated than open(), read() to access the device? Thanks,
If there really are devices which cannot handle reading at all, and
cannot be reset to a sane state by later driver initialization, then a
blacklist could be added for those devices. This should not be a commo=
n
problem.



Bj=C3=B8rn
Alex Williamson
2013-03-18 17:54:47 UTC
Permalink
Post by Bjørn Mork
=20
At least for KVM the kernel fix is the addition of the vfio driver =
which
Post by Bjørn Mork
gives us a non-sysfs way to do this. If this problem was found a f=
ew
Post by Bjørn Mork
years later and we were ready to make the switch I'd support just
removing these resource files. In the meantime we have userspace t=
hat
Post by Bjørn Mork
depends on this interface, so I'm open to suggestions how to fix it=
=2E
Post by Bjørn Mork
=20
=20
1) do you seriously mean that a userspace application (any, not just
udevadm or qemu or whatever) should be able to read and write thes=
e
Post by Bjørn Mork
registers while the device is owned by a driver? How is that ever
going to work?
The expectation is that the user doesn't mess with the device through
pci-sysfs while it's running. This is really no different than config
space or MMIO space in that respect. You can use setpci to break your
PCI card while it's used by the driver today. The difference is that
MMIO spaces side-step the issue by only allowing mmap and config space
is known not to have read side-effects.
Post by Bjørn Mork
2) is it really so that a device can be so fundamentally screwed up b=
y
Post by Bjørn Mork
reading some registers, that a later driver probe cannot properly
reinitialize it?
Never underestimate how broken hardware can be, though in this case
reading a device register seems to be causing a system hang/reset.
Post by Bjørn Mork
I would have thought that the solution to all this was to return -EIN=
VAL
Post by Bjørn Mork
on any attemt to read or write these files while a driver is bound to
the device. If userspace is going to use the API, then the applicati=
on
Post by Bjørn Mork
better unbind any driver first.
=20
Or? Am I missing something here?
That doesn't really solve anything though. Let's pretend the resource
files only work while the device is bound to pci-stub. Now what happen=
s
when you run this udevadm command as admin while it's in use by the
userspace driver? All we've done is limit the scope of the problem.
Post by Bjørn Mork
If we want to blacklist this specific device, that's fine, but as o=
thers
Post by Bjørn Mork
have pointed out it's really a class problem. Perhaps we report 1 =
byte
Post by Bjørn Mork
extra for the file length where EOF-1 is an enable byte? Is there
anything else in file ops that we could use to make it slightly mor=
e
Post by Bjørn Mork
complicated than open(), read() to access the device? Thanks,
=20
If there really are devices which cannot handle reading at all, and
cannot be reset to a sane state by later driver initialization, then =
a
Post by Bjørn Mork
blacklist could be added for those devices. This should not be a com=
mon
Post by Bjørn Mork
problem.
Yes, if these are dead registers, let's blacklist and move along. I
suspect though that these registers probably work fine if you access
them according to the device programming model, so blacklisting just
prevents full use through something like KVM device assignment. Thanks=
,

Alex


--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug=
" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjørn Mork
2013-03-18 18:25:01 UTC
Permalink
Post by Alex Williamson
Post by Bjørn Mork
=20
Post by Alex Williamson
At least for KVM the kernel fix is the addition of the vfio driver
which
Post by Bjørn Mork
Post by Alex Williamson
gives us a non-sysfs way to do this. If this problem was found a
few
Post by Bjørn Mork
Post by Alex Williamson
years later and we were ready to make the switch I'd support just
removing these resource files. In the meantime we have userspace
that
Post by Bjørn Mork
Post by Alex Williamson
depends on this interface, so I'm open to suggestions how to fix
it.
Post by Bjørn Mork
=20
=20
1) do you seriously mean that a userspace application (any, not just
udevadm or qemu or whatever) should be able to read and write
these
Post by Bjørn Mork
registers while the device is owned by a driver? How is that eve=
r
Post by Alex Williamson
Post by Bjørn Mork
going to work?
The expectation is that the user doesn't mess with the device through
pci-sysfs while it's running. This is really no different than config
space or MMIO space in that respect.=20
But it is. That's the problem. As a user I expect to be able to run e.=
g "grep . /sys/devices/whatever/*" with no ill effects. This holds for =
config space or MMIO space. It does not for any reset-on-read register.
Post by Alex Williamson
You can use setpci to break your
PCI card while it's used by the driver today. The difference is that
MMIO spaces side-step the issue by only allowing mmap and config space
is known not to have read side-effects.
Yes. And that is why there is no problem exporting those. This differen=
ce is fundamental.=20
Post by Alex Williamson
Post by Bjørn Mork
2) is it really so that a device can be so fundamentally screwed up
by
Post by Bjørn Mork
reading some registers, that a later driver probe cannot properly
reinitialize it?
Never underestimate how broken hardware can be,=20
True :)
Post by Alex Williamson
though in this case
reading a device register seems to be causing a system hang/reset.
I understand that it does so if the ahci driver is bound to the device =
while reading the registers, but does it also hang the system with no b=
ound driver? How does it do that? By killing the bus?
Post by Alex Williamson
Post by Bjørn Mork
I would have thought that the solution to all this was to return
-EINVAL
Post by Bjørn Mork
on any attemt to read or write these files while a driver is bound t=
o
Post by Alex Williamson
Post by Bjørn Mork
the device. If userspace is going to use the API, then the
application
Post by Bjørn Mork
better unbind any driver first.
=20
Or? Am I missing something here?
That doesn't really solve anything though. Let's pretend the resource
files only work while the device is bound to pci-stub. Now what
happens
when you run this udevadm command as admin while it's in use by the
userspace driver? All we've done is limit the scope of the problem.
Assuming that the system hangs without driver help and that this broken=
ness is widespread. I don't think any of those assumptions hold. Do the=
y?
Post by Alex Williamson
Post by Bjørn Mork
Post by Alex Williamson
If we want to blacklist this specific device, that's fine, but as
others
Post by Bjørn Mork
Post by Alex Williamson
have pointed out it's really a class problem. Perhaps we report 1
byte
Post by Bjørn Mork
Post by Alex Williamson
extra for the file length where EOF-1 is an enable byte? Is there
anything else in file ops that we could use to make it slightly
more
Post by Bjørn Mork
Post by Alex Williamson
complicated than open(), read() to access the device? Thanks,
=20
If there really are devices which cannot handle reading at all, and
cannot be reset to a sane state by later driver initialization, then
a
Post by Bjørn Mork
blacklist could be added for those devices. This should not be a
common
Post by Bjørn Mork
problem.
Yes, if these are dead registers, let's blacklist and move along. I
suspect though that these registers probably work fine if you access
them according to the device programming model, so blacklisting just
prevents full use through something like KVM device assignment.=20
Well, if the device is that broken then I think it will require the ker=
nel to police the device programming. I don't see how you can leave a b=
omb like that because it might be useful in a rare and very theoretical=
case.

Easier to just blacklist it...


Bj=C3=B8rn
Alex Williamson
2013-03-18 18:59:36 UTC
Permalink
Post by Bjørn Mork
=20
=20
At least for KVM the kernel fix is the addition of the vfio driv=
er
Post by Bjørn Mork
which
gives us a non-sysfs way to do this. If this problem was found =
a
Post by Bjørn Mork
few
years later and we were ready to make the switch I'd support jus=
t
Post by Bjørn Mork
removing these resource files. In the meantime we have userspac=
e
Post by Bjørn Mork
that
depends on this interface, so I'm open to suggestions how to fix
it.
=20
=20
1) do you seriously mean that a userspace application (any, not ju=
st
Post by Bjørn Mork
udevadm or qemu or whatever) should be able to read and write
these
registers while the device is owned by a driver? How is that e=
ver
Post by Bjørn Mork
going to work?
The expectation is that the user doesn't mess with the device throug=
h
Post by Bjørn Mork
pci-sysfs while it's running. This is really no different than conf=
ig
Post by Bjørn Mork
space or MMIO space in that respect.=20
=20
But it is. That's the problem. As a user I expect to be able to run
e.g "grep . /sys/devices/whatever/*" with no ill effects. This holds
for config space or MMIO space. It does not for any reset-on-read
register.
As a non-admin user you can
Post by Bjørn Mork
You can use setpci to break your
PCI card while it's used by the driver today. The difference is tha=
t
Post by Bjørn Mork
MMIO spaces side-step the issue by only allowing mmap and config spa=
ce
Post by Bjørn Mork
is known not to have read side-effects.
=20
Yes. And that is why there is no problem exporting those. This
difference is fundamental.=20
So how do we side-step the problem with I/O port registers? If we
remove them then KVM needs to run with iopl which is a pretty serious
security hole should QEMU be exploited. We could activate the resource
files only when the device is bound to pci-assign, but that only limits
the scope and might break UIO drivers. We could modify the file to hav=
e
an enable sequence, but we can't do this without breaking current
userspace. As I mentioned, the VFIO driver is intended to replace KVM'=
s
use of these files, but we're not ready to rip it out, perhaps not even
ready to declare it deprecated.
Post by Bjørn Mork
2) is it really so that a device can be so fundamentally screwed u=
p
Post by Bjørn Mork
by
reading some registers, that a later driver probe cannot proper=
ly
Post by Bjørn Mork
reinitialize it?
Never underestimate how broken hardware can be,=20
=20
True :)
=20
though in this case
reading a device register seems to be causing a system hang/reset.
=20
I understand that it does so if the ahci driver is bound to the devic=
e
Post by Bjørn Mork
while reading the registers, but does it also hang the system with no
bound driver? How does it do that? By killing the bus?
I don't know, Myron?
Post by Bjørn Mork
I would have thought that the solution to all this was to return
-EINVAL
on any attemt to read or write these files while a driver is bound=
to
Post by Bjørn Mork
the device. If userspace is going to use the API, then the
application
better unbind any driver first.
=20
Or? Am I missing something here?
That doesn't really solve anything though. Let's pretend the resour=
ce
Post by Bjørn Mork
files only work while the device is bound to pci-stub. Now what
happens
when you run this udevadm command as admin while it's in use by the
userspace driver? All we've done is limit the scope of the problem.
=20
Assuming that the system hangs without driver help and that this
brokenness is widespread. I don't think any of those assumptions hold=
=2E
Post by Bjørn Mork
Do they?
I thought it was true that for this device a system hang happened
regardless of the host driver, but haven't seen the original bug report=
=2E
As for widespread, this is the first I've heard of problems in the 2.5+
years that we've supported these I/O port resource files. The rest is
probably just FUD about random userspace apps trolling through device
registers.
Post by Bjørn Mork
If we want to blacklist this specific device, that's fine, but a=
s
Post by Bjørn Mork
others
have pointed out it's really a class problem. Perhaps we report=
1
Post by Bjørn Mork
byte
extra for the file length where EOF-1 is an enable byte? Is the=
re
Post by Bjørn Mork
anything else in file ops that we could use to make it slightly
more
complicated than open(), read() to access the device? Thanks,
=20
If there really are devices which cannot handle reading at all, an=
d
Post by Bjørn Mork
cannot be reset to a sane state by later driver initialization, th=
en
Post by Bjørn Mork
a
blacklist could be added for those devices. This should not be a
common
problem.
Yes, if these are dead registers, let's blacklist and move along. I
suspect though that these registers probably work fine if you access
them according to the device programming model, so blacklisting just
prevents full use through something like KVM device assignment.=20
=20
Well, if the device is that broken then I think it will require the
kernel to police the device programming. I don't see how you can leav=
e
Post by Bjørn Mork
a bomb like that because it might be useful in a rare and very
theoretical case.
=20
Easier to just blacklist it...
Easier, yes. But it likely just kicks the problem down the road until
the next device. Thanks,

Alex
Myron Stowe
2013-03-19 16:57:49 UTC
Permalink
Post by Alex Williamson
=20
=20
At least for KVM the kernel fix is the addition of the vfio dr=
iver
Post by Alex Williamson
which
gives us a non-sysfs way to do this. If this problem was foun=
d a
Post by Alex Williamson
few
years later and we were ready to make the switch I'd support j=
ust
Post by Alex Williamson
removing these resource files. In the meantime we have usersp=
ace
Post by Alex Williamson
that
depends on this interface, so I'm open to suggestions how to f=
ix
Post by Alex Williamson
it.
=20
=20
1) do you seriously mean that a userspace application (any, not =
just
Post by Alex Williamson
udevadm or qemu or whatever) should be able to read and write
these
registers while the device is owned by a driver? How is that=
ever
Post by Alex Williamson
going to work?
The expectation is that the user doesn't mess with the device thro=
ugh
Post by Alex Williamson
pci-sysfs while it's running. This is really no different than co=
nfig
Post by Alex Williamson
space or MMIO space in that respect.=20
=20
But it is. That's the problem. As a user I expect to be able to ru=
n
Post by Alex Williamson
e.g "grep . /sys/devices/whatever/*" with no ill effects. This hold=
s
Post by Alex Williamson
for config space or MMIO space. It does not for any reset-on-read
register.
=20
As a non-admin user you can
=20
You can use setpci to break your
PCI card while it's used by the driver today. The difference is t=
hat
Post by Alex Williamson
MMIO spaces side-step the issue by only allowing mmap and config s=
pace
Post by Alex Williamson
is known not to have read side-effects.
=20
Yes. And that is why there is no problem exporting those. This
difference is fundamental.=20
=20
So how do we side-step the problem with I/O port registers? If we
remove them then KVM needs to run with iopl which is a pretty serious
security hole should QEMU be exploited. We could activate the resour=
ce
Post by Alex Williamson
files only when the device is bound to pci-assign, but that only limi=
ts
Post by Alex Williamson
the scope and might break UIO drivers. We could modify the file to h=
ave
Post by Alex Williamson
an enable sequence, but we can't do this without breaking current
userspace. As I mentioned, the VFIO driver is intended to replace KV=
M's
Post by Alex Williamson
use of these files, but we're not ready to rip it out, perhaps not ev=
en
Post by Alex Williamson
ready to declare it deprecated.
=20
2) is it really so that a device can be so fundamentally screwed=
up
Post by Alex Williamson
by
reading some registers, that a later driver probe cannot prop=
erly
Post by Alex Williamson
reinitialize it?
Never underestimate how broken hardware can be,=20
=20
True :)
=20
though in this case
reading a device register seems to be causing a system hang/reset.
=20
I understand that it does so if the ahci driver is bound to the dev=
ice
Post by Alex Williamson
while reading the registers, but does it also hang the system with =
no
Post by Alex Williamson
bound driver? How does it do that? By killing the bus?
=20
I don't know, Myron?
Yes - the system hangs when BAR1's (and likely BAR3's) I/O port space i=
s
read.

Here are the details that I've been able to put together from the two
linux-pci threads and various online sources -


=46rom Robert Hancock - "... BAR5 is the MMIO region used by the AHCI
driver. BARs 0-4 are the legacy SFF-compatible ATA ports. Nothing
should be messing with those IO ports while AHCI is enabled. ..." This
likely explains why the system boots and runs fine as long as the
'udevadm ...' command is *not* ran (i.e. the driver never accesses the
I/O port BARs).

Using a SATA controller I have access to as an example for the details
(Note: I do not have access to a system with the Marvell 9125 device):
00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chips=
et 6 port SATA AHCI Controller (rev 06) (prog-if 01 [AHCI 1.0])
Subsystem: Lenovo Device 2168
Region 0: I/O ports at 1860 [size=3D8]
Region 1: I/O ports at 1814 [size=3D4]
Region 2: I/O ports at 1818 [size=3D8]
Region 3: I/O ports at 1810 [size=3D4]
Region 4: I/O ports at 1840 [size=3D32]
Region 5: Memory at f2827000 (32-bit, non-prefetchable) [size=3D2K]

I/O port registers [1][2]:
Primary IDE controller [0x1860-0x1867; 0x1814-0x1817]
BAR0 Base address for the command block registers for ATA Channel X
0x1860 (Read/Write): Data Register
0x1861 (Read): Error Register
0x1861 (Write): Features Register
0x1862 (Read/Write): Sector Count Register
0x1863 (Read/Write): LBA Low Register
0x1864 (Read/Write): LBA Mid Register
0x1865 (Read/Write): LBA High Register
0x1866 (Read/Write): Drive/Head Register
0x1867 (Read): Status Register
0x1867 (Write): Command Register
BAR1* Base address for the control register for ATA Channel X
0x1814 Reserved
0x1815 Reserved
0x1816 (Read): Alternate Status Register
0x1816 (Write): Device Control Register
0x1817 Reserved

* The base must be Dword aligned; a PCI requirement. The Device Contro=
l
and Alternate Status Registers are at ofset 0x2 from this base.

[1] www.t13.org/documents/UploadedDocuments/project/d1510r1-Host-Adapte=
r.pdf
[2] lateblt.tripod.com/atapi.htm

=46rom Xiangliang - executing 'udevadm ...' causes a 32-bit I/O port re=
ad
to BAR1's region. This is shown by the BE (Byte Enable) value of
0x1111. So apparently reads to this region that include any of reserve=
d
Bytes causes "the chip will go bad."

So, only a Byte access at offset 2 is successful. I have not been able
to get any more details as to the exact cause of the hang. I would hav=
e
thought that the PCI transaction would have just timed out, or errored
out, or something but apparently the platform ends up hanging.

It appears that this device did not implement the reserved registers
such that they would return 0 on reads or something more similarly sane=
=2E

Since BARs 2 and 3 are not 0, indicating the device only supports one
channel, I expect the same issue will occur when accessing BAR3. Again=
,
I do not have access to a system with this device to test with.
Post by Alex Williamson
=20
I would have thought that the solution to all this was to return
-EINVAL
on any attemt to read or write these files while a driver is bou=
nd to
Post by Alex Williamson
the device. If userspace is going to use the API, then the
application
better unbind any driver first.
=20
Or? Am I missing something here?
That doesn't really solve anything though. Let's pretend the reso=
urce
Post by Alex Williamson
files only work while the device is bound to pci-stub. Now what
happens
when you run this udevadm command as admin while it's in use by th=
e
Post by Alex Williamson
userspace driver? All we've done is limit the scope of the proble=
m.
Post by Alex Williamson
=20
Assuming that the system hangs without driver help and that this
brokenness is widespread. I don't think any of those assumptions ho=
ld.
Post by Alex Williamson
Do they?
=20
I thought it was true that for this device a system hang happened
regardless of the host driver, but haven't seen the original bug repo=
rt.
Post by Alex Williamson
As for widespread, this is the first I've heard of problems in the 2.=
5+
Post by Alex Williamson
years that we've supported these I/O port resource files. The rest i=
s
Post by Alex Williamson
probably just FUD about random userspace apps trolling through device
registers.
=20
If we want to blacklist this specific device, that's fine, but=
as
Post by Alex Williamson
others
have pointed out it's really a class problem. Perhaps we repo=
rt 1
Post by Alex Williamson
byte
extra for the file length where EOF-1 is an enable byte? Is t=
here
Post by Alex Williamson
anything else in file ops that we could use to make it slightl=
y
Post by Alex Williamson
more
complicated than open(), read() to access the device? Thanks,
=20
If there really are devices which cannot handle reading at all, =
and
Post by Alex Williamson
cannot be reset to a sane state by later driver initialization, =
then
Post by Alex Williamson
a
blacklist could be added for those devices. This should not be =
a
Post by Alex Williamson
common
problem.
Yes, if these are dead registers, let's blacklist and move along. =
I
Post by Alex Williamson
suspect though that these registers probably work fine if you acce=
ss
Post by Alex Williamson
them according to the device programming model, so blacklisting ju=
st
Post by Alex Williamson
prevents full use through something like KVM device assignment.=20
=20
Well, if the device is that broken then I think it will require the
kernel to police the device programming. I don't see how you can le=
ave
Post by Alex Williamson
a bomb like that because it might be useful in a rare and very
theoretical case.
=20
Easier to just blacklist it...
=20
Easier, yes. But it likely just kicks the problem down the road unti=
l
Post by Alex Williamson
the next device. Thanks,
=20
Alex
=20
=20
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug=
" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Myron Stowe
2013-03-19 17:06:54 UTC
Permalink
Post by Alex Williamson
=20
=20
At least for KVM the kernel fix is the addition of the vfio =
driver
Post by Alex Williamson
which
gives us a non-sysfs way to do this. If this problem was fo=
und a
Post by Alex Williamson
few
years later and we were ready to make the switch I'd support=
just
Post by Alex Williamson
removing these resource files. In the meantime we have user=
space
Post by Alex Williamson
that
depends on this interface, so I'm open to suggestions how to=
fix
Post by Alex Williamson
it.
=20
=20
1) do you seriously mean that a userspace application (any, no=
t just
Post by Alex Williamson
udevadm or qemu or whatever) should be able to read and wri=
te
Post by Alex Williamson
these
registers while the device is owned by a driver? How is th=
at ever
Post by Alex Williamson
going to work?
The expectation is that the user doesn't mess with the device th=
rough
Post by Alex Williamson
pci-sysfs while it's running. This is really no different than =
config
Post by Alex Williamson
space or MMIO space in that respect.=20
=20
But it is. That's the problem. As a user I expect to be able to =
run
Post by Alex Williamson
e.g "grep . /sys/devices/whatever/*" with no ill effects. This ho=
lds
Post by Alex Williamson
for config space or MMIO space. It does not for any reset-on-read
register.
=20
As a non-admin user you can
=20
You can use setpci to break your
PCI card while it's used by the driver today. The difference is=
that
Post by Alex Williamson
MMIO spaces side-step the issue by only allowing mmap and config=
space
Post by Alex Williamson
is known not to have read side-effects.
=20
Yes. And that is why there is no problem exporting those. This
difference is fundamental.=20
=20
So how do we side-step the problem with I/O port registers? If we
remove them then KVM needs to run with iopl which is a pretty serio=
us
Post by Alex Williamson
security hole should QEMU be exploited. We could activate the reso=
urce
Post by Alex Williamson
files only when the device is bound to pci-assign, but that only li=
mits
Post by Alex Williamson
the scope and might break UIO drivers. We could modify the file to=
have
Post by Alex Williamson
an enable sequence, but we can't do this without breaking current
userspace. As I mentioned, the VFIO driver is intended to replace =
KVM's
Post by Alex Williamson
use of these files, but we're not ready to rip it out, perhaps not =
even
Post by Alex Williamson
ready to declare it deprecated.
=20
2) is it really so that a device can be so fundamentally screw=
ed up
Post by Alex Williamson
by
reading some registers, that a later driver probe cannot pr=
operly
Post by Alex Williamson
reinitialize it?
Never underestimate how broken hardware can be,=20
=20
True :)
=20
though in this case
reading a device register seems to be causing a system hang/rese=
t.
Post by Alex Williamson
=20
I understand that it does so if the ahci driver is bound to the d=
evice
Post by Alex Williamson
while reading the registers, but does it also hang the system wit=
h no
Post by Alex Williamson
bound driver? How does it do that? By killing the bus?
=20
I don't know, Myron?
=20
Yes - the system hangs when BAR1's (and likely BAR3's) I/O port space=
is
read.
Sorry - that wasn't very explicit. Just accessing BAR1's region as
udevadm does is enough to hang the system - even when no driver is
bound.
=20
Here are the details that I've been able to put together from the two
linux-pci threads and various online sources -
=20
=20
From Robert Hancock - "... BAR5 is the MMIO region used by the AHCI
driver. BARs 0-4 are the legacy SFF-compatible ATA ports. Nothing
should be messing with those IO ports while AHCI is enabled. ..." Th=
is
likely explains why the system boots and runs fine as long as the
'udevadm ...' command is *not* ran (i.e. the driver never accesses th=
e
I/O port BARs).
=20
Using a SATA controller I have access to as an example for the detail=
s
(Note: I do not have access to a system with the Marvell 9125 device)=
00:1f.2 SATA controller: Intel Corporation 5 Series/3400 Series Chi=
pset 6 port SATA AHCI Controller (rev 06) (prog-if 01 [AHCI 1.0])
Subsystem: Lenovo Device 2168
Region 0: I/O ports at 1860 [size=3D8]
Region 1: I/O ports at 1814 [size=3D4]
Region 2: I/O ports at 1818 [size=3D8]
Region 3: I/O ports at 1810 [size=3D4]
Region 4: I/O ports at 1840 [size=3D32]
Region 5: Memory at f2827000 (32-bit, non-prefetchable) [size=3D2K]
=20
Primary IDE controller [0x1860-0x1867; 0x1814-0x1817]
BAR0 Base address for the command block registers for ATA Channel =
X
0x1860 (Read/Write): Data Register
0x1861 (Read): Error Register
0x1861 (Write): Features Register
0x1862 (Read/Write): Sector Count Register
0x1863 (Read/Write): LBA Low Register
0x1864 (Read/Write): LBA Mid Register
0x1865 (Read/Write): LBA High Register
0x1866 (Read/Write): Drive/Head Register
0x1867 (Read): Status Register
0x1867 (Write): Command Register
BAR1* Base address for the control register for ATA Channel X
0x1814 Reserved
0x1815 Reserved
0x1816 (Read): Alternate Status Register
0x1816 (Write): Device Control Register
0x1817 Reserved
=20
* The base must be Dword aligned; a PCI requirement. The Device Cont=
rol
and Alternate Status Registers are at ofset 0x2 from this base.
=20
[1] www.t13.org/documents/UploadedDocuments/project/d1510r1-Host-Adap=
ter.pdf
[2] lateblt.tripod.com/atapi.htm
=20
From Xiangliang - executing 'udevadm ...' causes a 32-bit I/O port re=
ad
to BAR1's region. This is shown by the BE (Byte Enable) value of
0x1111. So apparently reads to this region that include any of reser=
ved
Bytes causes "the chip will go bad."
=20
So, only a Byte access at offset 2 is successful. I have not been ab=
le
to get any more details as to the exact cause of the hang. I would h=
ave
thought that the PCI transaction would have just timed out, or errore=
d
out, or something but apparently the platform ends up hanging.
=20
It appears that this device did not implement the reserved registers
such that they would return 0 on reads or something more similarly sa=
ne.
=20
Since BARs 2 and 3 are not 0, indicating the device only supports one
channel, I expect the same issue will occur when accessing BAR3. Aga=
in,
I do not have access to a system with this device to test with.
=20
Post by Alex Williamson
=20
I would have thought that the solution to all this was to retu=
rn
Post by Alex Williamson
-EINVAL
on any attemt to read or write these files while a driver is b=
ound to
Post by Alex Williamson
the device. If userspace is going to use the API, then the
application
better unbind any driver first.
=20
Or? Am I missing something here?
That doesn't really solve anything though. Let's pretend the re=
source
Post by Alex Williamson
files only work while the device is bound to pci-stub. Now what
happens
when you run this udevadm command as admin while it's in use by =
the
Post by Alex Williamson
userspace driver? All we've done is limit the scope of the prob=
lem.
Post by Alex Williamson
=20
Assuming that the system hangs without driver help and that this
brokenness is widespread. I don't think any of those assumptions =
hold.
Post by Alex Williamson
Do they?
=20
I thought it was true that for this device a system hang happened
regardless of the host driver, but haven't seen the original bug re=
port.
Post by Alex Williamson
As for widespread, this is the first I've heard of problems in the =
2.5+
Post by Alex Williamson
years that we've supported these I/O port resource files. The rest=
is
Post by Alex Williamson
probably just FUD about random userspace apps trolling through devi=
ce
Post by Alex Williamson
registers.
=20
If we want to blacklist this specific device, that's fine, b=
ut as
Post by Alex Williamson
others
have pointed out it's really a class problem. Perhaps we re=
port 1
Post by Alex Williamson
byte
extra for the file length where EOF-1 is an enable byte? Is=
there
Post by Alex Williamson
anything else in file ops that we could use to make it sligh=
tly
Post by Alex Williamson
more
complicated than open(), read() to access the device? Thank=
s,
Post by Alex Williamson
=20
If there really are devices which cannot handle reading at all=
, and
Post by Alex Williamson
cannot be reset to a sane state by later driver initialization=
, then
Post by Alex Williamson
a
blacklist could be added for those devices. This should not b=
e a
Post by Alex Williamson
common
problem.
Yes, if these are dead registers, let's blacklist and move along=
=2E I
Post by Alex Williamson
suspect though that these registers probably work fine if you ac=
cess
Post by Alex Williamson
them according to the device programming model, so blacklisting =
just
Post by Alex Williamson
prevents full use through something like KVM device assignment.=20
=20
Well, if the device is that broken then I think it will require t=
he
Post by Alex Williamson
kernel to police the device programming. I don't see how you can =
leave
Post by Alex Williamson
a bomb like that because it might be useful in a rare and very
theoretical case.
=20
Easier to just blacklist it...
=20
Easier, yes. But it likely just kicks the problem down the road un=
til
Post by Alex Williamson
the next device. Thanks,
=20
Alex
=20
=20
=20
=20
Robert Brown
2013-03-18 18:02:01 UTC
Permalink
Post by Alex Williamson
Post by Bjørn Mork
At least for KVM the kernel fix is the addition of the vfio driver =
which
Post by Alex Williamson
Post by Bjørn Mork
gives us a non-sysfs way to do this. If this problem was found a f=
ew
Post by Alex Williamson
Post by Bjørn Mork
years later and we were ready to make the switch I'd support just
removing these resource files. In the meantime we have userspace t=
hat
Post by Alex Williamson
Post by Bjørn Mork
depends on this interface, so I'm open to suggestions how to fix it=
=2E
Post by Alex Williamson
Post by Bjørn Mork
1) do you seriously mean that a userspace application (any, not just
udevadm or qemu or whatever) should be able to read and write the=
se
Post by Alex Williamson
Post by Bjørn Mork
registers while the device is owned by a driver? How is that eve=
r
Post by Alex Williamson
Post by Bjørn Mork
going to work?
The expectation is that the user doesn't mess with the device through
pci-sysfs while it's running. This is really no different than confi=
g
Post by Alex Williamson
space or MMIO space in that respect. You can use setpci to break you=
r
Post by Alex Williamson
PCI card while it's used by the driver today. The difference is that
MMIO spaces side-step the issue by only allowing mmap and config spac=
e
Post by Alex Williamson
is known not to have read side-effects.
Post by Bjørn Mork
2) is it really so that a device can be so fundamentally screwed up =
by
Post by Alex Williamson
Post by Bjørn Mork
reading some registers, that a later driver probe cannot properly
reinitialize it?
Never underestimate how broken hardware can be, though in this case
reading a device register seems to be causing a system hang/reset.
The real problem is that PCI devices can be bus masters, which means
they can screw up *ANYTHING* (almost)!
Myron Stowe
2013-03-17 14:33:05 UTC
Permalink
Post by Alex Williamson
Post by Greg KH
Post by Alex Williamson
Post by Greg KH
Post by Myron Stowe
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
Yes :P , you raise a very good point, there are a lot of way a user can
poke around in those BARs. However, there is a difference between
shooting yourself in the foot and getting what you deserve versus
unknowingly executing a common command such as udevadm and having the
system hang.
Post by Greg KH
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
Xiangliang initially proposed a patch within the PCI core. Ignoring the
specific issue with the proposal which I pointed out in the
https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like
the right place to effect a change either as PCI's core isn't concerned
with the contents or access limitations of those regions, those are
issues that the driver concerns itself with.
So things seem to be gravitating towards the driver. I'm fairly
ignorant of this area but as Robert succinctly pointed out in the
originating thread - the AHCI driver only uses the device's MMIO region.
The I/O related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the observance that
userspace accesses such as udevadm, and others like you additionally
point out, do not filter through the device's driver for seems to
suggest that changes to the driver will not help here either.
A PCI quirk should handle this properly, right? Why not do that? Worse
thing, the quirk could just not expose these sysfs files for this
device, which would solve all userspace program issues, right?
Not exactly. I/O port access through pci-sysfs was added for userspace
programs, specifically qemu-kvm device assignment. We use the I/O port
resource# files to access device owned I/O port registers using file
permissions rather than global permissions such as iopl/ioperm. File
permissions also prevent random users from accessing device registers
through these files, but of course can't stop a privileged app that
chooses to ignore the purpose of these files. A quirk would therefore
remove a file that actually has a useful purpose for one app just so
another app that has no particular reason for dumping the contents can
run unabated. Thanks,
The quirk would only be for this one specific device, which obviously
can't handle this type of access, so why would you want the sysfs files
even present for it at all?
I'm assuming that the device only breaks because udevadm is dumping the
full I/O port register space of the device and that if an actual driver
was interacting with it through this interface that it would work.
Correct:
the AHCI driver only uses the device's MMIO region. The I/O
related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the
observance that userspace accesses such as udevadm, and others
like Greg additionally pointed out, do not filter through the
device's driver seems to suggest that changes to the driver will
not help here either.
Post by Alex Williamson
Who
knows how many devices will have read side-effects by udevadm blindly
dumping these files. Thanks,
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Alex Williamson
2013-03-17 22:28:48 UTC
Permalink
Post by Myron Stowe
Post by Alex Williamson
Post by Greg KH
Post by Alex Williamson
Post by Greg KH
Post by Myron Stowe
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
Yes :P , you raise a very good point, there are a lot of way a user can
poke around in those BARs. However, there is a difference between
shooting yourself in the foot and getting what you deserve versus
unknowingly executing a common command such as udevadm and having the
system hang.
Post by Greg KH
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
Xiangliang initially proposed a patch within the PCI core. Ignoring the
specific issue with the proposal which I pointed out in the
https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like
the right place to effect a change either as PCI's core isn't concerned
with the contents or access limitations of those regions, those are
issues that the driver concerns itself with.
So things seem to be gravitating towards the driver. I'm fairly
ignorant of this area but as Robert succinctly pointed out in the
originating thread - the AHCI driver only uses the device's MMIO region.
The I/O related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the observance that
userspace accesses such as udevadm, and others like you additionally
point out, do not filter through the device's driver for seems to
suggest that changes to the driver will not help here either.
A PCI quirk should handle this properly, right? Why not do that? Worse
thing, the quirk could just not expose these sysfs files for this
device, which would solve all userspace program issues, right?
Not exactly. I/O port access through pci-sysfs was added for userspace
programs, specifically qemu-kvm device assignment. We use the I/O port
resource# files to access device owned I/O port registers using file
permissions rather than global permissions such as iopl/ioperm. File
permissions also prevent random users from accessing device registers
through these files, but of course can't stop a privileged app that
chooses to ignore the purpose of these files. A quirk would therefore
remove a file that actually has a useful purpose for one app just so
another app that has no particular reason for dumping the contents can
run unabated. Thanks,
The quirk would only be for this one specific device, which obviously
can't handle this type of access, so why would you want the sysfs files
even present for it at all?
I'm assuming that the device only breaks because udevadm is dumping the
full I/O port register space of the device and that if an actual driver
was interacting with it through this interface that it would work.
the AHCI driver only uses the device's MMIO region. The I/O
related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the
observance that userspace accesses such as udevadm, and others
like Greg additionally pointed out, do not filter through the
device's driver seems to suggest that changes to the driver will
not help here either.
That may be true of our AHCI driver, but when it's assigned to a guest
we're potentially using a completely different stack and cannot make
that assumption. A guest running in compatibility mode or the option
ROM for the device may still use I/O port regions. Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Don Dutile
2013-03-18 14:50:42 UTC
Permalink
Post by Alex Williamson
Post by Myron Stowe
Post by Alex Williamson
Post by Greg KH
Post by Alex Williamson
Post by Greg KH
Post by Myron Stowe
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
Yes :P , you raise a very good point, there are a lot of way a user can
poke around in those BARs. However, there is a difference between
shooting yourself in the foot and getting what you deserve versus
unknowingly executing a common command such as udevadm and having the
system hang.
Post by Greg KH
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
Xiangliang initially proposed a patch within the PCI core. Ignoring the
specific issue with the proposal which I pointed out in the
https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like
the right place to effect a change either as PCI's core isn't concerned
with the contents or access limitations of those regions, those are
issues that the driver concerns itself with.
So things seem to be gravitating towards the driver. I'm fairly
ignorant of this area but as Robert succinctly pointed out in the
originating thread - the AHCI driver only uses the device's MMIO region.
The I/O related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the observance that
userspace accesses such as udevadm, and others like you additionally
point out, do not filter through the device's driver for seems to
suggest that changes to the driver will not help here either.
A PCI quirk should handle this properly, right? Why not do that? Worse
thing, the quirk could just not expose these sysfs files for this
device, which would solve all userspace program issues, right?
Not exactly. I/O port access through pci-sysfs was added for userspace
programs, specifically qemu-kvm device assignment. We use the I/O port
resource# files to access device owned I/O port registers using file
permissions rather than global permissions such as iopl/ioperm. File
permissions also prevent random users from accessing device registers
through these files, but of course can't stop a privileged app that
chooses to ignore the purpose of these files. A quirk would therefore
remove a file that actually has a useful purpose for one app just so
another app that has no particular reason for dumping the contents can
run unabated. Thanks,
The quirk would only be for this one specific device, which obviously
can't handle this type of access, so why would you want the sysfs files
even present for it at all?
I'm assuming that the device only breaks because udevadm is dumping the
full I/O port register space of the device and that if an actual driver
was interacting with it through this interface that it would work.
the AHCI driver only uses the device's MMIO region. The I/O
related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the
observance that userspace accesses such as udevadm, and others
like Greg additionally pointed out, do not filter through the
device's driver seems to suggest that changes to the driver will
not help here either.
That may be true of our AHCI driver, but when it's assigned to a guest
we're potentially using a completely different stack and cannot make
that assumption. A guest running in compatibility mode or the option
ROM for the device may still use I/O port regions. Thanks,
Alex
In quick summary:
(1)reading a device's registers may have side effects
on the device operation, e.g., a register maps to a device's FIFO register.
(2) Having two threads read such device registers can cause unknown results,
i.e., driver & user-app.
(3) It may be valid for a user-app to read device regs, e.g.,
qemu-kvm assigned device

So, can't it be solved by:
(a) if no driver is configured for the device, than it's valid for a user-app
to read the device regs ?
-- although diff. user apps doing so still exposes the problem, and
can't be distinguished, e.g., qemu-kvm + udevadm
-- or can file permissions (set by libvirt driving qemu-kvm
device assignment) block multiple user-app reading ?
i.e., basically, a user-level version of a driver allocating
the device, which in the case of qemu-kvm device-assignment,
is what is actually happening! :)
(b) if driver is configured, need a quirk-registration, or generic, optional,
driver function to check for user-app reading approval.

ok, bash away...
Alex Williamson
2013-03-18 16:34:32 UTC
Permalink
Post by Don Dutile
Post by Alex Williamson
Post by Myron Stowe
Post by Alex Williamson
Post by Greg KH
Post by Alex Williamson
Post by Greg KH
Post by Myron Stowe
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
Yes :P , you raise a very good point, there are a lot of way a user can
poke around in those BARs. However, there is a difference between
shooting yourself in the foot and getting what you deserve versus
unknowingly executing a common command such as udevadm and having the
system hang.
Post by Greg KH
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
Xiangliang initially proposed a patch within the PCI core. Ignoring the
specific issue with the proposal which I pointed out in the
https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like
the right place to effect a change either as PCI's core isn't concerned
with the contents or access limitations of those regions, those are
issues that the driver concerns itself with.
So things seem to be gravitating towards the driver. I'm fairly
ignorant of this area but as Robert succinctly pointed out in the
originating thread - the AHCI driver only uses the device's MMIO region.
The I/O related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the observance that
userspace accesses such as udevadm, and others like you additionally
point out, do not filter through the device's driver for seems to
suggest that changes to the driver will not help here either.
A PCI quirk should handle this properly, right? Why not do that? Worse
thing, the quirk could just not expose these sysfs files for this
device, which would solve all userspace program issues, right?
Not exactly. I/O port access through pci-sysfs was added for userspace
programs, specifically qemu-kvm device assignment. We use the I/O port
resource# files to access device owned I/O port registers using file
permissions rather than global permissions such as iopl/ioperm. File
permissions also prevent random users from accessing device registers
through these files, but of course can't stop a privileged app that
chooses to ignore the purpose of these files. A quirk would therefore
remove a file that actually has a useful purpose for one app just so
another app that has no particular reason for dumping the contents can
run unabated. Thanks,
The quirk would only be for this one specific device, which obviously
can't handle this type of access, so why would you want the sysfs files
even present for it at all?
I'm assuming that the device only breaks because udevadm is dumping the
full I/O port register space of the device and that if an actual driver
was interacting with it through this interface that it would work.
the AHCI driver only uses the device's MMIO region. The I/O
related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the
observance that userspace accesses such as udevadm, and others
like Greg additionally pointed out, do not filter through the
device's driver seems to suggest that changes to the driver will
not help here either.
That may be true of our AHCI driver, but when it's assigned to a guest
we're potentially using a completely different stack and cannot make
that assumption. A guest running in compatibility mode or the option
ROM for the device may still use I/O port regions. Thanks,
Alex
(1)reading a device's registers may have side effects
on the device operation, e.g., a register maps to a device's FIFO register.
(2) Having two threads read such device registers can cause unknown results,
i.e., driver & user-app.
(3) It may be valid for a user-app to read device regs, e.g.,
qemu-kvm assigned device
(a) if no driver is configured for the device, than it's valid for a user-app
to read the device regs ?
-- although diff. user apps doing so still exposes the problem, and
can't be distinguished, e.g., qemu-kvm + udevadm
-- or can file permissions (set by libvirt driving qemu-kvm
device assignment) block multiple user-app reading ?
i.e., basically, a user-level version of a driver allocating
the device, which in the case of qemu-kvm device-assignment,
is what is actually happening! :)
(b) if driver is configured, need a quirk-registration, or generic, optional,
driver function to check for user-app reading approval.
ok, bash away...
I think concurrency is a secondary issue. The primary issue is whether
read() is somehow so special in sysfs that all files need to be regarded
as o+r. If that's true, then indeed there are concurrency issues.
Thanks,

Alex
Myron Stowe
2013-03-17 14:12:22 UTC
Permalink
Post by Greg KH
Post by Myron Stowe
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
Yes :P , you raise a very good point, there are a lot of way a user can
poke around in those BARs. However, there is a difference between
shooting yourself in the foot and getting what you deserve versus
unknowingly executing a common command such as udevadm and having the
system hang.
Post by Greg KH
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
Xiangliang initially proposed a patch within the PCI core. Ignoring the
specific issue with the proposal which I pointed out in the
https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like
the right place to effect a change either as PCI's core isn't concerned
with the contents or access limitations of those regions, those are
issues that the driver concerns itself with.
So things seem to be gravitating towards the driver. I'm fairly
ignorant of this area but as Robert succinctly pointed out in the
originating thread - the AHCI driver only uses the device's MMIO region.
The I/O related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the observance that
userspace accesses such as udevadm, and others like you additionally
point out, do not filter through the device's driver for seems to
suggest that changes to the driver will not help here either.
A PCI quirk should handle this properly, right? Why not do that? Worse
thing, the quirk could just not expose these sysfs files for this
device, which would solve all userspace program issues, right?
The quirk you are suggesting would basically have to be a reversion of
commit 8633328 for the reasons that Bjorn pointed out so that we cover
all devices, not just this one particular device:
We could put a quirk in the kernel for this device (obviously
the
issue is independent of whether the driver is loaded), but no
doubt
other devices with I/O BARs will have access size restrictions,
side
effects, or other issues. Adding quirks for them feels like a
never-ending job.

I'm beginning to think that people have not read the analysis which was
the first mail entry of this thread (I meant for the Subject: to read
"PATCH 0/1] ...) which is at https://lkml.org/lkml/2013/3/16/168

It appears [*] that we are exposed to this potential conflict with
*every* PCI device's resource# files; not just this one particular
device (again see the analysis cover email, especially the three
paragraphs starting with "Putting together...").

[*] I carefully use the word "appears" due to the one aspect of this
whole issue that I still do not understand which I also expressed in the
cover - which is immediately below the section I just pointed out above.


So what I'd like to understand and why we are focusing on this one
particular instance/device when we *appear* to be at risk with all
devices and their resource# files?

Myron
Post by Greg KH
thanks,
greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Robert Hancock
2013-03-19 01:54:09 UTC
Permalink
Post by Greg KH
Post by Myron Stowe
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
lspci doesn't randomly attempt to access device registers, AFAIK..
Post by Greg KH
Post by Myron Stowe
Yes :P , you raise a very good point, there are a lot of way a user can
poke around in those BARs. However, there is a difference between
shooting yourself in the foot and getting what you deserve versus
unknowingly executing a common command such as udevadm and having the
system hang.
Post by Greg KH
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
Xiangliang initially proposed a patch within the PCI core. Ignoring the
specific issue with the proposal which I pointed out in the
https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like
the right place to effect a change either as PCI's core isn't concerned
with the contents or access limitations of those regions, those are
issues that the driver concerns itself with.
So things seem to be gravitating towards the driver. I'm fairly
ignorant of this area but as Robert succinctly pointed out in the
originating thread - the AHCI driver only uses the device's MMIO region.
The I/O related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the observance that
userspace accesses such as udevadm, and others like you additionally
point out, do not filter through the device's driver for seems to
suggest that changes to the driver will not help here either.
A PCI quirk should handle this properly, right? Why not do that? Worse
thing, the quirk could just not expose these sysfs files for this
device, which would solve all userspace program issues, right?
A PCI quirk implies there is something wrong with this device in
particular. This isn't the case. The device responds properly when it's
accessed as intended. The problem is that udevadm (or other processes,
like a random grep through sysfs for example) is effectively reading
registers willy-nilly. This is absolutely not safe to do on many devices
- and certainly not while a driver is attached to the device and has
claimed the port or MMIO regions that are being accessed. Blocking
access through these files to a device with an active driver that's
claimed the regions would significantly reduce the chances of something
like this causing problems.
Greg KH
2013-03-19 02:03:16 UTC
Permalink
Post by Robert Hancock
Post by Greg KH
Post by Myron Stowe
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
lspci doesn't randomly attempt to access device registers, AFAIK..
Have you read the man page for the '-xxx' option to lspci? lspci can be
quite intrusive, and I used to have a number of systems that it would
trash very easily if you ran it on them as root.
Post by Robert Hancock
Post by Greg KH
Post by Myron Stowe
Yes :P , you raise a very good point, there are a lot of way a user can
poke around in those BARs. However, there is a difference between
shooting yourself in the foot and getting what you deserve versus
unknowingly executing a common command such as udevadm and having the
system hang.
Post by Greg KH
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
Xiangliang initially proposed a patch within the PCI core. Ignoring the
specific issue with the proposal which I pointed out in the
https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like
the right place to effect a change either as PCI's core isn't concerned
with the contents or access limitations of those regions, those are
issues that the driver concerns itself with.
So things seem to be gravitating towards the driver. I'm fairly
ignorant of this area but as Robert succinctly pointed out in the
originating thread - the AHCI driver only uses the device's MMIO region.
The I/O related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the observance that
userspace accesses such as udevadm, and others like you additionally
point out, do not filter through the device's driver for seems to
suggest that changes to the driver will not help here either.
A PCI quirk should handle this properly, right? Why not do that? Worse
thing, the quirk could just not expose these sysfs files for this
device, which would solve all userspace program issues, right?
A PCI quirk implies there is something wrong with this device in
particular. This isn't the case. The device responds properly when
it's accessed as intended. The problem is that udevadm (or other
processes, like a random grep through sysfs for example) is
effectively reading registers willy-nilly. This is absolutely not
safe to do on many devices - and certainly not while a driver is
attached to the device and has claimed the port or MMIO regions that
are being accessed.
Then we need to fix that!

In the kernel!

Don't try to gloss over the problem by changing one random userspace
program, you will never catch them all. Fix the root problem here
people, that's all I'm asking for.
Post by Robert Hancock
Blocking access through these files to a device with an active driver
that's claimed the regions would significantly reduce the chances of
something like this causing problems.
Great, that's one possible solution, the other is just not creating the
files at all for known problem devices, right?

My main point here is, you aren't going to fix this in userspace, fix it
in the kernel.

greg k-h
Robert Hancock
2013-03-19 02:09:22 UTC
Permalink
Post by Greg KH
Post by Robert Hancock
Post by Greg KH
Post by Myron Stowe
Post by Greg KH
Post by Myron Stowe
Sysfs includes entries to memory that backs a PCI device's BARs, both I/O
Port space and MMIO. This memory regions correspond to the device's
internal status and control registers used to drive the device.
Accessing these registers from userspace such as "udevadm info
--attribute-walk --path=/sys/devices/..." does can not be allowed as
such accesses outside of the driver, even just reading, can yield
catastrophic consequences.
Udevadm-info skips parsing a specific set of sysfs entries including
'resource'. This patch extends the set to include the additional
'resource<N>' entries that correspond to a PCI device's BARs.
Nice, are you also going to patch bash to prevent a user from reading
these sysfs files as well? :)
And pciutils?
You get my point here, right? The root user just asked to read all of
the data for this device, so why wouldn't you allow it? Just like
'lspci' does. Or bash does.
lspci doesn't randomly attempt to access device registers, AFAIK..
Have you read the man page for the '-xxx' option to lspci? lspci can be
quite intrusive, and I used to have a number of systems that it would
trash very easily if you ran it on them as root.
Post by Robert Hancock
Post by Greg KH
Post by Myron Stowe
Yes :P , you raise a very good point, there are a lot of way a user can
poke around in those BARs. However, there is a difference between
shooting yourself in the foot and getting what you deserve versus
unknowingly executing a common command such as udevadm and having the
system hang.
Post by Greg KH
If this hardware has a problem, then it needs to be fixed in the kernel,
not have random band-aids added to various userspace programs to paper
over the root problem here. Please fix the kernel driver and all should
be fine. No need to change udevadm.
Xiangliang initially proposed a patch within the PCI core. Ignoring the
specific issue with the proposal which I pointed out in the
https://lkml.org/lkml/2013/3/7/242 thread, that just doesn't seem like
the right place to effect a change either as PCI's core isn't concerned
with the contents or access limitations of those regions, those are
issues that the driver concerns itself with.
So things seem to be gravitating towards the driver. I'm fairly
ignorant of this area but as Robert succinctly pointed out in the
originating thread - the AHCI driver only uses the device's MMIO region.
The I/O related regions are for legacy SFF-compatible ATA ports and are
not used to driver the device. This, coupled with the observance that
userspace accesses such as udevadm, and others like you additionally
point out, do not filter through the device's driver for seems to
suggest that changes to the driver will not help here either.
A PCI quirk should handle this properly, right? Why not do that? Worse
thing, the quirk could just not expose these sysfs files for this
device, which would solve all userspace program issues, right?
A PCI quirk implies there is something wrong with this device in
particular. This isn't the case. The device responds properly when
it's accessed as intended. The problem is that udevadm (or other
processes, like a random grep through sysfs for example) is
effectively reading registers willy-nilly. This is absolutely not
safe to do on many devices - and certainly not while a driver is
attached to the device and has claimed the port or MMIO regions that
are being accessed.
Then we need to fix that!
In the kernel!
Don't try to gloss over the problem by changing one random userspace
program, you will never catch them all. Fix the root problem here
people, that's all I'm asking for.
Post by Robert Hancock
Blocking access through these files to a device with an active driver
that's claimed the regions would significantly reduce the chances of
something like this causing problems.
Great, that's one possible solution, the other is just not creating the
files at all for known problem devices, right?
I don't think one can reasonably enumerate all problem devices. There
are probably countless devices which can potentially break if their
resources (especially IO ports) are read in unexpected ways. Aside
from devices like this one, which apparently don't like certain IO
ports being read with certain access widths, there's every device in
existence with read-to-reset type registers. The fix to this needs to
apply to all devices.
Post by Greg KH
My main point here is, you aren't going to fix this in userspace, fix it
in the kernel.
The kernel can help the situation by blocking access to devices with
an active driver, but it can't fix all cases. Suppose the device has
no driver loaded yet, how is the kernel supposed to tell the
difference between software with a legitimate need to access these
files for virtualization device assignment, etc. and something like
udevadm or a random grep command that's reading the files without any
idea what it's doing? udevadm does need to be fixed to avoid accessing
these files because it's unnecessary and dangerous.
Greg KH
2013-03-19 02:35:25 UTC
Permalink
Post by Robert Hancock
Post by Greg KH
Great, that's one possible solution, the other is just not creating the
files at all for known problem devices, right?
I don't think one can reasonably enumerate all problem devices. There
are probably countless devices which can potentially break if their
resources (especially IO ports) are read in unexpected ways. Aside
from devices like this one, which apparently don't like certain IO
ports being read with certain access widths, there's every device in
existence with read-to-reset type registers. The fix to this needs to
apply to all devices.
Post by Greg KH
My main point here is, you aren't going to fix this in userspace, fix it
in the kernel.
The kernel can help the situation by blocking access to devices with
an active driver, but it can't fix all cases. Suppose the device has
no driver loaded yet, how is the kernel supposed to tell the
difference between software with a legitimate need to access these
files for virtualization device assignment, etc. and something like
udevadm or a random grep command that's reading the files without any
idea what it's doing? udevadm does need to be fixed to avoid accessing
these files because it's unnecessary and dangerous.
Are you going to also fix grep? bash? cat?

Come on, be realistic. If these files are so dangerous then they need
to just be removed entirely from the kernel. You aren't going to be
able to patch grep for this.

greg k-h
Robert Hancock
2013-03-19 03:08:37 UTC
Permalink
Post by Greg KH
Post by Robert Hancock
Post by Greg KH
Great, that's one possible solution, the other is just not creating the
files at all for known problem devices, right?
I don't think one can reasonably enumerate all problem devices. There
are probably countless devices which can potentially break if their
resources (especially IO ports) are read in unexpected ways. Aside
from devices like this one, which apparently don't like certain IO
ports being read with certain access widths, there's every device in
existence with read-to-reset type registers. The fix to this needs to
apply to all devices.
Post by Greg KH
My main point here is, you aren't going to fix this in userspace, fix it
in the kernel.
The kernel can help the situation by blocking access to devices with
an active driver, but it can't fix all cases. Suppose the device has
no driver loaded yet, how is the kernel supposed to tell the
difference between software with a legitimate need to access these
files for virtualization device assignment, etc. and something like
udevadm or a random grep command that's reading the files without any
idea what it's doing? udevadm does need to be fixed to avoid accessing
these files because it's unnecessary and dangerous.
Are you going to also fix grep? bash? cat?
Come on, be realistic. If these files are so dangerous then they need
to just be removed entirely from the kernel. You aren't going to be
able to patch grep for this.
Well, clearly not. Although accessing this file with grep, etc. is
really just another way root can shoot themselves in the foot, it
would be nice if this functionality could be provided in a way that
didn't leave this kind of exposed land mine.
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...