Discussion:
[PATCH] pciehp: Acknowledge the spurious "cmd completed" event.
Rajat Jain
2013-11-05 22:33:28 UTC
Permalink
In case of a spurious "cmd completed", pcie_write_cmd() does not
clear it, but yet expects more "cmd completed" events to be generated.
This does not happen because the previous (spurious) event has not
been acknowledged. Fix that.

Signed-off-by: Rajat Jain <***@juniper.net>
Signed-off-by: Guenter Roeck <***@juniper.net>
---
This is how I saw it in action: my controller does not implement any
hot-plug elements (LED, power ctrl, EMI etc) but still supports Command
completed bit.
- During initialization,
pcie_disable_notification()
-> pcie_write_cmd()
-> writes to Slot control register
-> which causes PCI_EXP_SLTSTA_CC to get set, which is not
cleared, because IRQ is not generated (we just disabled
notifications).
- After some time,
pcie_enable_notification()
-> pcie_write_cmd()
-> finds PCI_EXP_SLTSTA_CC is set, assumes it is spurious.
-> Does not clear it, yet expects more command completed
events to be generated (never happens).

drivers/pci/hotplug/pciehp_hpc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 5b8d749..ba8e06f 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -185,6 +185,7 @@ static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
}

if (slot_status & PCI_EXP_SLTSTA_CC) {
+ pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
if (!ctrl->no_cmd_complete) {
/*
* After 1 sec and CMD_COMPLETED still not set, just
--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas
2013-11-06 00:25:05 UTC
Permalink
Post by Rajat Jain
In case of a spurious "cmd completed", pcie_write_cmd() does not
clear it, but yet expects more "cmd completed" events to be generated.
This does not happen because the previous (spurious) event has not
been acknowledged. Fix that.
---
This is how I saw it in action: my controller does not implement any
hot-plug elements (LED, power ctrl, EMI etc) but still supports Command
completed bit.
- During initialization,
pcie_disable_notification()
-> pcie_write_cmd()
-> writes to Slot control register
-> which causes PCI_EXP_SLTSTA_CC to get set, which is not
cleared, because IRQ is not generated (we just disabled
notifications).
- After some time,
pcie_enable_notification()
-> pcie_write_cmd()
-> finds PCI_EXP_SLTSTA_CC is set, assumes it is spurious.
-> Does not clear it, yet expects more command completed
events to be generated (never happens).
I'm not sure this "cmd completed" is actually spurious. Spec section
7.8.10 is very clear that any write to Slot Control must cause a
hot-plug command to be generated (if the port is hot-plug capable).

Can you collect "lspci -vv" output for your controller? I assume
you're hitting this case in pcie_init() (added by 5808639bfa98
("pciehp: fix slow probing")):

/*
* Controller doesn't notify of command completion if the "No
* Command Completed Support" bit is set in Slot Capability
* register or the controller supports none of power
* controller, attention led, power led and EMI.
*/
if (NO_CMD_CMPL(ctrl) ||
!(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
ctrl->no_cmd_complete = 1;

and we're setting "no_cmd_complete = 1" for your controller, which
keeps us from waiting for completion in pcie_write_cmd().

I'm dubious about the assertion that a controller without power
control, attention LED, power LED, or EMI can't support command
completion. I don't see anything in the spec to that effect.

Since you're seeing PCI_EXP_SLTSTA_CC=1, your controller *should*
support Command Completion notification and PCI_EXP_SLTCAP_NCCS should
be 0 (per Table 7-20), so I wonder what happens on your system if you
change pcie_init() so it leaves "ctrl->no_cmd_complete = 0" instead?
Does it work correctly then?

I know we can't just drop the "!(POWER_CTRL(ctrl) | ...)" tests
because we don't want to reintroduce the problem fixed by
5808639bfa98, but I wonder if we can find a better fix that addresses
both problems.

Bjorn
Post by Rajat Jain
drivers/pci/hotplug/pciehp_hpc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 5b8d749..ba8e06f 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -185,6 +185,7 @@ static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
}
if (slot_status & PCI_EXP_SLTSTA_CC) {
+ pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
if (!ctrl->no_cmd_complete) {
/*
* After 1 sec and CMD_COMPLETED still not set, just
--
1.7.9.5
Rajat Jain
2013-11-06 02:38:51 UTC
Permalink
Hello Bjorn,
Post by Bjorn Helgaas
Post by Rajat Jain
In case of a spurious "cmd completed", pcie_write_cmd() does not
clear it, but yet expects more "cmd completed" events to be generated.
This does not happen because the previous (spurious) event has not
been acknowledged. Fix that.
---
This is how I saw it in action: my controller does not implement any
hot-plug elements (LED, power ctrl, EMI etc) but still supports Command
completed bit.
- During initialization,
pcie_disable_notification()
-> pcie_write_cmd()
-> writes to Slot control register
-> which causes PCI_EXP_SLTSTA_CC to get set, which is not
cleared, because IRQ is not generated (we just disabled
notifications).
- After some time,
pcie_enable_notification()
-> pcie_write_cmd()
-> finds PCI_EXP_SLTSTA_CC is set, assumes it is spurious.
-> Does not clear it, yet expects more command completed
events to be generated (never happens).
I'm not sure this "cmd completed" is actually spurious. Spec section
7.8.10 is very clear that any write to Slot Control must cause a
hot-plug command to be generated (if the port is hot-plug capable).
I agree, and I think I am witnessing a "genuine" command completion
(caused by disabling of notifications).
Post by Bjorn Helgaas
Can you collect "lspci -vv" output for your controller?
Sure:

PCI bridge: Integrated Device Technology, Inc. Device 807a (rev 02)
(prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Bus: primary=02, secondary=50, subordinate=5f, sec-latency=0
I/O behind bridge: 00003000-00003fff
Memory behind bridge: 8c000000-8fffffff
Prefetchable memory behind bridge: 00000000b0600000-00000000b07fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Downstream Port (Slot+), MSI 00
DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag+ RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #3, Speed 5GT/s, Width x4, ASPM L0s L1, Latency
L0 <4us, L1 <4us
ClockPM- Surprise+ LLActRep+ BwNot+
LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk-
DLActive- BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise-
Slot #3, PowerLimit 0.000W; Interlock- NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet+ CmdCplt+ HPIrq+ LinkChg+
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt+ PresDet- Interlock-
Changed: MRL- PresDet- LinkState-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-,
OBFF Not Supported ARIFwd+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-,
OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-,
Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB,
EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-,
LinkEqualizationRequest-
Capabilities: [c0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000ff041740 Data: 0003
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [200 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=4
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=04 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Port Arbitration Table <?>
Capabilities: [320 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+
EgressCtrl+ DirectTrans+
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd-
EgressCtrl- DirectTrans-
Capabilities: [330 v1] #12
Kernel driver in use: pcieport

In addition, here is what the pciehp driver spews out:

Hotplug Controller:
Seg/Bus/Dev/Func/IRQ : 0000:02:03.0 IRQ 21
Vendor ID : 0x111d
Device ID : 0x807a
Subsystem ID : 0x0000
Subsystem Vendor ID : 0x0000
PCIe Cap offset : 0x40
PCI resource [7] : [io 0x13000-0x13fff]
PCI resource [8] : [mem 0xc0c000000-0xc0fffffff]
PCI resource [9] : [mem 0xc30600000-0xc307fffff 64bit pref]
Slot Capabilities : 0x00180040
Physical Slot Number : 3
Attention Button : no
Power Controller : no
MRL Sensor : no
Attention Indicator : no
Power Indicator : no
Hot-Plug Surprise : no
EMI Present : no
Command Completed : yes
Slot Status : 0x0010
Slot Control : 0x0000
Link Active Reporting supported
HPC vendor_id 111d device_id 807a ss_vid 0 ss_did 0
Registering domain:bus:dev=0000:50:00 sun=3
Unexpected CMD_COMPLETED. Need to wait for command completed event.
Command not completed in 1000 msec

The last two output lines are part of the problem. Each controller
takes 1 second to initialized as the message shows.
Post by Bjorn Helgaas
I assume
you're hitting this case in pcie_init() (added by 5808639bfa98
/*
* Controller doesn't notify of command completion if the "No
* Command Completed Support" bit is set in Slot Capability
* register or the controller supports none of power
* controller, attention led, power led and EMI.
*/
if (NO_CMD_CMPL(ctrl) ||
!(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
ctrl->no_cmd_complete = 1;
and we're setting "no_cmd_complete = 1" for your controller, which
keeps us from waiting for completion in pcie_write_cmd().
I'm dubious about the assertion that a controller without power
control, attention LED, power LED, or EMI can't support command
completion. I don't see anything in the spec to that effect.
I agree, specially since I am seeing this contrary behavior (to that
assertion) in action :-)
Post by Bjorn Helgaas
Since you're seeing PCI_EXP_SLTSTA_CC=1, your controller *should*
support Command Completion notification and PCI_EXP_SLTCAP_NCCS should
be 0 (per Table 7-20), so I wonder what happens on your system if you
change pcie_init() so it leaves "ctrl->no_cmd_complete = 0" instead?
Does it work correctly then?
Yes, it work well after that (Both the error output lines go away as
well). And the 1 second delay per controller is also gone.
Post by Bjorn Helgaas
I know we can't just drop the "!(POWER_CTRL(ctrl) | ...)" tests
because we don't want to reintroduce the problem fixed by
5808639bfa98, but I wonder if we can find a better fix that addresses
both problems.
Please let me know if any one has any suggestions? IMHO, this patch
should yet not cause the original problem to reappear because we are
clearing the bit *only* when we know that it is already set (and no
way to clear it because at least in my controller no subsequent
interrupts can be generated). But I do not have enough understanding,
and will be glad to get any pointers.

Thanks,

- Rajat
Post by Bjorn Helgaas
Bjorn
Post by Rajat Jain
drivers/pci/hotplug/pciehp_hpc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 5b8d749..ba8e06f 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -185,6 +185,7 @@ static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
}
if (slot_status & PCI_EXP_SLTSTA_CC) {
+ pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
if (!ctrl->no_cmd_complete) {
/*
* After 1 sec and CMD_COMPLETED still not set, just
--
1.7.9.5
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Rajat Jain
2013-11-07 21:53:44 UTC
Permalink
Hi Bjorn / list,

I think there are two independent problems that need to be addressed.

Problem #1: In any condition where (e.g. spurious interrupt) once the
execution reaches inside the if condition below, the PCI_EXP_SLTSTA_CC
bit is already set, and needs to be cleared. This does not happen
today and hence this patch was aimed at solving that. Please note that
this will not cause any problems to the systems that were fixed by
5808639bfa98, because this is equally applicable to any case.
Post by Bjorn Helgaas
Post by Rajat Jain
drivers/pci/hotplug/pciehp_hpc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 5b8d749..ba8e06f 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -185,6 +185,7 @@ static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
}
if (slot_status & PCI_EXP_SLTSTA_CC) {
+ pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
if (!ctrl->no_cmd_complete) {
/*
* After 1 sec and CMD_COMPLETED still not set, just
--
1.7.9.5
Problem #2: Looks like we have now 2 classes of systems:

a) The 5808639bfa98 was addressed towards systems that advertise the
"Command completion notification" capability, but do not notify of
command completion in case MRL, Power Ctrl, Attn LED and Pwr LED bits
are not written in SLOT_CTRL register (not implemented in
capabilities). Thus no command completion shall be generated for
pcie_disable_notifications() for e.g.

b) I have a system that where none of the above described HP elements
are present implemented, but the controller supports command
completion, and sends it out for every write of the slot control
register. Thus notification for command complete is generated for
pcie_disable_notifiction().

I don't believe there is a way to differentiate between these 2
classes of systems at init time, unless we try to generate a
notification and do or do not get one.

The PCIe specification section 7.8.10 seems to lean towards category
(b) of systems, but today this class of systems shall get penalized by
delay of 1 second (per controller) during probe. I can try and address
that, but admittedly I could not think of a better solution than
moving this code from pcie_init() to pcie_write_cmd().
Post by Bjorn Helgaas
if (NO_CMD_CMPL(ctrl) ||
!(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
ctrl->no_cmd_complete = 1;
Please advise.

Also, I'd appreciate if you could please let me know if there are any
comments on the patch for problem #1 above.

Thanks,

Rajat
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas
2013-11-08 01:23:52 UTC
Permalink
Post by Rajat Jain
Hi Bjorn / list,
I think there are two independent problems that need to be addressed.
Problem #1: In any condition where (e.g. spurious interrupt) once the
execution reaches inside the if condition below, the PCI_EXP_SLTSTA_CC
bit is already set, and needs to be cleared. This does not happen
today and hence this patch was aimed at solving that. Please note that
this will not cause any problems to the systems that were fixed by
5808639bfa98, because this is equally applicable to any case.
Your patch might be the right thing to do, but something still niggles
at me, and I can't quite put my finger on it yet.
Post by Rajat Jain
Post by Rajat Jain
drivers/pci/hotplug/pciehp_hpc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 5b8d749..ba8e06f 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -185,6 +185,7 @@ static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
}
if (slot_status & PCI_EXP_SLTSTA_CC) {
+ pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
if (!ctrl->no_cmd_complete) {
/*
* After 1 sec and CMD_COMPLETED still not set, just
--
1.7.9.5
a) The 5808639bfa98 was addressed towards systems that advertise the
"Command completion notification" capability, but do not notify of
command completion in case MRL, Power Ctrl, Attn LED and Pwr LED bits
are not written in SLOT_CTRL register (not implemented in
capabilities). Thus no command completion shall be generated for
pcie_disable_notifications() for e.g.
This seems pretty clearly out of spec. I read the 82801H spec for the
part used in the system in bz 10751, and the documentation seems to
comply with the spec. Possibly the part doesn't work as advertised,
but I think it's more likely there's some subtlety we're still missing.

I'm a little bit surprised that we'd be using pciehp on that Thinkpad R61
instead of acpiphp. The kernel in bz 10751 is so ancient that it might
not be negotiating correctly for control of pciehp.
Post by Rajat Jain
b) I have a system that where none of the above described HP elements
are present implemented, but the controller supports command
completion, and sends it out for every write of the slot control
register. Thus notification for command complete is generated for
pcie_disable_notifiction().
I don't believe there is a way to differentiate between these 2
classes of systems at init time, unless we try to generate a
notification and do or do not get one.
The PCIe specification section 7.8.10 seems to lean towards category
(b) of systems, but today this class of systems shall get penalized by
delay of 1 second (per controller) during probe.
Where does this delay happen on your system? I tried to work through
the path but I don't see it yet. Here's what I expect to happen on your
system:

pciehp_probe
pcie_init
readl(SLTCAP)
if (NO_CMD_CMPL || !(POWER_CTRL | ATTN_LED | PWR_LED | EMI)) # true
ctrl->no_cmd_complete = 1 # set (condition above is true)
writew(SLTSTA, 0x1f) # clear ABP PFD MRLSC PDC CC
pcie_disable_notification
pcie_write_cmd(0, PDCE | ABPE | MRLSCE | PFDE | HPIE | CCIE | DLLSCE)
readw(SLTSTA) # CC == 0 (was cleared above)
writew(SLTCTL)
...
# no waiting here because no_cmd_complete == 1
...
hardware sets SLTSTA.CC
...
pcie_init_notification
pcie_enable_notification
pcie_write_cmd
readw(SLTSTA) # CC == 1 (was set by HW above)
if (SLTSTA.CC) # true
if (!NO_CMD_CMPL)) # true
dbg("Unexpected CMD_COMPLETED. ...")
ctrl->no_cmd_complete = 0
writew(SLTCTL)
...
# this time we wait because no_cmd_complete == 0
...
pcie_wait_cmd(..., poll == 1) # now no_cmd_complete == 0
pcie_poll_cmd
readw(SLTSTA) # CC == 1 on first read
if (SLTSTA.CC)
writew(SLTSTA, CC)
return 1

I would think you'd see the "Unexpected CMD_COMPLETED. Need to wait for
command completed event" message (if enabled), but I don't see where the
one second timeout would happen. It looks like we would call
pcie_poll_cmd(), but it would return immediately because SLTSTA.CC is
already set.

Bjorn
Rajat Jain
2013-11-08 17:30:38 UTC
Permalink
Hi,
Post by Bjorn Helgaas
Post by Rajat Jain
Hi Bjorn / list,
I think there are two independent problems that need to be addressed.
Problem #1: In any condition where (e.g. spurious interrupt) once the
execution reaches inside the if condition below, the PCI_EXP_SLTSTA_CC
bit is already set, and needs to be cleared. This does not happen
today and hence this patch was aimed at solving that. Please note that
this will not cause any problems to the systems that were fixed by
5808639bfa98, because this is equally applicable to any case.
Your patch might be the right thing to do, but something still niggles
at me, and I can't quite put my finger on it yet.
Post by Rajat Jain
Post by Rajat Jain
drivers/pci/hotplug/pciehp_hpc.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 5b8d749..ba8e06f 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -185,6 +185,7 @@ static int pcie_write_cmd(struct controller *ctrl, u16 cmd, u16 mask)
}
if (slot_status & PCI_EXP_SLTSTA_CC) {
+ pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_CC);
if (!ctrl->no_cmd_complete) {
/*
* After 1 sec and CMD_COMPLETED still not set, just
--
1.7.9.5
a) The 5808639bfa98 was addressed towards systems that advertise the
"Command completion notification" capability, but do not notify of
command completion in case MRL, Power Ctrl, Attn LED and Pwr LED bits
are not written in SLOT_CTRL register (not implemented in
capabilities). Thus no command completion shall be generated for
pcie_disable_notifications() for e.g.
This seems pretty clearly out of spec. I read the 82801H spec for the
part used in the system in bz 10751, and the documentation seems to
comply with the spec. Possibly the part doesn't work as advertised,
but I think it's more likely there's some subtlety we're still missing.
I'm a little bit surprised that we'd be using pciehp on that Thinkpad R61
instead of acpiphp. The kernel in bz 10751 is so ancient that it might
not be negotiating correctly for control of pciehp.
Post by Rajat Jain
b) I have a system that where none of the above described HP elements
are present implemented, but the controller supports command
completion, and sends it out for every write of the slot control
register. Thus notification for command complete is generated for
pcie_disable_notifiction().
I don't believe there is a way to differentiate between these 2
classes of systems at init time, unless we try to generate a
notification and do or do not get one.
The PCIe specification section 7.8.10 seems to lean towards category
(b) of systems, but today this class of systems shall get penalized by
delay of 1 second (per controller) during probe.
Where does this delay happen on your system? I tried to work through
the path but I don't see it yet. Here's what I expect to happen on your
pciehp_probe
pcie_init
readl(SLTCAP)
if (NO_CMD_CMPL || !(POWER_CTRL | ATTN_LED | PWR_LED | EMI)) # true
ctrl->no_cmd_complete = 1 # set (condition above is true)
writew(SLTSTA, 0x1f) # clear ABP PFD MRLSC PDC CC
pcie_disable_notification
pcie_write_cmd(0, PDCE | ABPE | MRLSCE | PFDE | HPIE | CCIE | DLLSCE)
readw(SLTSTA) # CC == 0 (was cleared above)
writew(SLTCTL)
...
# no waiting here because no_cmd_complete == 1
...
hardware sets SLTSTA.CC
...
pcie_init_notification
pcie_enable_notification
pcie_write_cmd
readw(SLTSTA) # CC == 1 (was set by HW above)
if (SLTSTA.CC) # true
if (!NO_CMD_CMPL)) # true
dbg("Unexpected CMD_COMPLETED. ...")
ctrl->no_cmd_complete = 0
writew(SLTCTL)
...
# this time we wait because no_cmd_complete == 0
...
pcie_wait_cmd(..., poll == 1) # now no_cmd_complete == 0
Actually pcie_wait_cmd() shall be called with poll=0. That is because
the following conditions are both false (we just enabled both the
interrupts while enabling notifications):

if (!(slot_ctrl & PCI_EXP_SLTCTL_HPIE) ||
!(slot_ctrl & PCI_EXP_SLTCTL_CCIE))
poll = 1;

Thus it does not poll, but waits for interrupt. Which does not happen
because the CC bit in the slot status register was not cleared. As a
result, we get the following messages on the console for each
controller (and 1 second delay):
===========================
Unexpected CMD_COMPLETED. Need to wait for command completed event.
Command not completed in 1000 msec
===========================
With my patch, we'd eliminate the 1 second delay and the second line
of output above. If we also want to get rid of the first line
"Unexpected CMD_COMPLETED..." also, that would probably have to be
solved by changing the behavior to assume that CC shall be generated
for every SLOTCTRL register write.

Please suggest.

Thanks,

Rajat
Post by Bjorn Helgaas
pcie_poll_cmd
readw(SLTSTA) # CC == 1 on first read
if (SLTSTA.CC)
writew(SLTSTA, CC)
return 1
I would think you'd see the "Unexpected CMD_COMPLETED. Need to wait for
command completed event" message (if enabled),
but I don't see where the
one second timeout would happen. It looks like we would call
pcie_poll_cmd(), but it would return immediately because SLTSTA.CC is
already set.
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas
2013-11-08 23:15:46 UTC
Permalink
Post by Rajat Jain
Post by Bjorn Helgaas
Where does this delay happen on your system? I tried to work through
the path but I don't see it yet. Here's what I expect to happen on your
pciehp_probe
pcie_init
readl(SLTCAP)
if (NO_CMD_CMPL || !(POWER_CTRL | ATTN_LED | PWR_LED | EMI)) # true
ctrl->no_cmd_complete = 1 # set (condition above is true)
writew(SLTSTA, 0x1f) # clear ABP PFD MRLSC PDC CC
pcie_disable_notification
pcie_write_cmd(0, PDCE | ABPE | MRLSCE | PFDE | HPIE | CCIE | DLLSCE)
readw(SLTSTA) # CC == 0 (was cleared above)
writew(SLTCTL)
...
# no waiting here because no_cmd_complete == 1
...
hardware sets SLTSTA.CC
...
pcie_init_notification
pcie_enable_notification
pcie_write_cmd
readw(SLTSTA) # CC == 1 (was set by HW above)
if (SLTSTA.CC) # true
if (!NO_CMD_CMPL)) # true
dbg("Unexpected CMD_COMPLETED. ...")
ctrl->no_cmd_complete = 0
writew(SLTCTL)
...
# this time we wait because no_cmd_complete == 0
...
pcie_wait_cmd(..., poll == 1) # now no_cmd_complete == 0
Actually pcie_wait_cmd() shall be called with poll=0. That is because
the following conditions are both false (we just enabled both the
if (!(slot_ctrl & PCI_EXP_SLTCTL_HPIE) ||
!(slot_ctrl & PCI_EXP_SLTCTL_CCIE))
poll = 1;
Thus it does not poll, but waits for interrupt. Which does not happen
because the CC bit in the slot status register was not cleared.
Right, of course. Thanks for pointing out my error.
Post by Rajat Jain
As a
result, we get the following messages on the console for each
===========================
Unexpected CMD_COMPLETED. Need to wait for command completed event.
Command not completed in 1000 msec
===========================
With my patch, we'd eliminate the 1 second delay and the second line
of output above. If we also want to get rid of the first line
"Unexpected CMD_COMPLETED..." also, that would probably have to be
solved by changing the behavior to assume that CC shall be generated
for every SLOTCTRL register write.
I *do* want to get rid of the "Unexpected CMD_COMPLETED" complaint.
We shouldn't complain about hardware that is working perfectly fine.
I don't know the best way to do that yet. I have found a box with the
same hardware that was fixed by 5808639bfa98, so I hope to play with
it and figure out something that will work nicely for both scenarios.

BTW, can you open a report at bugzilla.kernel.org and attach your
"lspci -vvxxx" and dmesg output? When we eventually merge a fix, I'd
like to have the details archived somewhere.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Rajat Jain
2013-11-11 21:26:06 UTC
Permalink
Hello,
Post by Bjorn Helgaas
Post by Rajat Jain
With my patch, we'd eliminate the 1 second delay and the second line
of output above. If we also want to get rid of the first line
"Unexpected CMD_COMPLETED..." also, that would probably have to be
solved by changing the behavior to assume that CC shall be generated
for every SLOTCTRL register write.
I *do* want to get rid of the "Unexpected CMD_COMPLETED" complaint.
We shouldn't complain about hardware that is working perfectly fine.
I don't know the best way to do that yet. I have found a box with the
same hardware that was fixed by 5808639bfa98, so I hope to play with
it and figure out something that will work nicely for both scenarios.
Please keep posted :-)
Post by Bjorn Helgaas
BTW, can you open a report at bugzilla.kernel.org and attach your
"lspci -vvxxx" and dmesg output? When we eventually merge a fix, I'd
like to have the details archived somewhere.
Done:

https://bugzilla.kernel.org/show_bug.cgi?id=64821

Please let me know if you need any help in trying something out - I'd
be more than keen to help - write code or test.

BTW, I figured another way that solves the current problem at hand
(for both kind of controllers). Today the events are cleared just
before call to pcie_disable_notification, if we clear them afterwards
also, the current problem is taken care of for all controllers. Of
course this does not look the right way to solve the problem though
(since the ctrl->no_cmd_completed shall still be 1, which is not
right).

@@ -948,6 +948,10 @@ struct controller *pcie_init(struct pcie_device *dev)

/* Clear all remaining event bits in Slot Status register */
if (pciehp_writew(ctrl, PCI_EXP_SLTSTA, 0x1f))
goto abort_ctrl;

/* Disable sotfware notification */
pcie_disable_notification(ctrl);

+ /* Clear all remaining event bits in Slot Status register */
+ if (pciehp_writew(ctrl, PCI_EXP_SLTSTA, 0x1f))
+ goto abort_ctrl;
+
ctrl_info(ctrl, "HPC vendor_id %x device_id %x ss_vid %x ss_did %x\n",
pdev->vendor, pdev->device, pdev->subsystem_vendor,
pdev->subsystem_device);


Thanks,

Rajat
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas
2013-11-23 00:51:02 UTC
Permalink
Post by Rajat Jain
Hello,
Post by Bjorn Helgaas
Post by Rajat Jain
With my patch, we'd eliminate the 1 second delay and the second line
of output above. If we also want to get rid of the first line
"Unexpected CMD_COMPLETED..." also, that would probably have to be
solved by changing the behavior to assume that CC shall be generated
for every SLOTCTRL register write.
I *do* want to get rid of the "Unexpected CMD_COMPLETED" complaint.
We shouldn't complain about hardware that is working perfectly fine.
I don't know the best way to do that yet. I have found a box with the
same hardware that was fixed by 5808639bfa98, so I hope to play with
it and figure out something that will work nicely for both scenarios.
Please keep posted :-)
Post by Bjorn Helgaas
BTW, can you open a report at bugzilla.kernel.org and attach your
"lspci -vvxxx" and dmesg output? When we eventually merge a fix, I'd
like to have the details archived somewhere.
https://bugzilla.kernel.org/show_bug.cgi?id=64821
Please let me know if you need any help in trying something out - I'd
be more than keen to help - write code or test.
What do you think of the patch below? I'm afraid we'll trip over
a few other old parts similar to the 82801H, but I'd rather do that
than penalize the parts that work correctly.


PCI: pciehp: Support command completion even with no hotplug hardware

From: Bjorn Helgaas <***@google.com>

Commit 5808639bfa98 ("pciehp: fix slow probing") fixed a slow probing
problem on hardware that doesn't conform to the spec, but caused a
similar problem for hardware that *does* conform to the spec.

Per PCIe 3.0, sec 7.8.10 and 6.7.3.2, any write to Slot Control generates a
hot-plug command. Ports that can accept new commands with no delay can set
the "No Command Completed Support" bit. Otherwise the port must indicate
its completion of the command and readiness to accept a new command with a
"command completed event."

5808639bfa98 assumes ports that lack a power controller, power indicator,
attention indicator, and interlock will not generate completion events,
even if they neglect to set "No Command Completed Support." But on ports
that lack those elements and *do* support command completion notification,
it causes:

Unexpected CMD_COMPLETED. Need to wait for command completed event.
Command not completed in 1000 msec

and forces us to wait for a 1 second timeout.

This patch makes the 5808639bfa98 workaround into a quirk that's applied
only to devices known to be broken, currently just Intel 82801H ports.
There are probably other similarly-broken devices that may now probe
slowly, but I don't know how to catch them all without penalizing the
ones that play by the rules.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=64821
Reported-by: Rajat Jain <***@juniper.net>
Signed-off-by: Bjorn Helgaas <***@google.com>
---
drivers/pci/hotplug/pciehp_hpc.c | 39 +++++++++++++++++++++++++++++---------
1 file changed, 30 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3eea3fdd4b0b..2fd2bd59e07f 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -881,6 +881,34 @@ static inline void dbg_ctrl(struct controller *ctrl)
ctrl_info(ctrl, "Slot Control : 0x%04x\n", reg16);
}

+static int pciehp_no_command_complete(struct controller *ctrl)
+{
+ struct pcie_device *dev = ctrl->pcie;
+ struct pci_dev *pdev = dev->port;
+ u16 vendor, device;
+
+ if (NO_CMD_CMPL(ctrl))
+ return 1;
+
+ /*
+ * Controller should notify on command completion unless the "No
+ * Command Completed Support" bit is set. But some hardware does
+ * not. See https://bugzilla.kernel.org/show_bug.cgi?id=10751
+ */
+ if (!(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl))) {
+ vendor = pdev->vendor;
+ device = pdev->device;
+ if (vendor == PCI_VENDOR_ID_INTEL &&
+ device >= 0x283f && device <= 0x2849) {
+ dev_info(&dev->device, "device [%04x:%04x] does not notify on hotplug command completion\n",
+ vendor, device);
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
struct controller *pcie_init(struct pcie_device *dev)
{
struct controller *ctrl;
@@ -902,15 +930,8 @@ struct controller *pcie_init(struct pcie_device *dev)
mutex_init(&ctrl->ctrl_lock);
init_waitqueue_head(&ctrl->queue);
dbg_ctrl(ctrl);
- /*
- * Controller doesn't notify of command completion if the "No
- * Command Completed Support" bit is set in Slot Capability
- * register or the controller supports none of power
- * controller, attention led, power led and EMI.
- */
- if (NO_CMD_CMPL(ctrl) ||
- !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
- ctrl->no_cmd_complete = 1;
+
+ ctrl->no_cmd_complete = pciehp_no_command_complete(ctrl);

/* Check if Data Link Layer Link Active Reporting is implemented */
if (pciehp_readl(ctrl, PCI_EXP_LNKCAP, &link_cap)) {
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Guenter Roeck
2013-11-23 01:59:53 UTC
Permalink
Post by Bjorn Helgaas
Post by Rajat Jain
Hello,
Post by Bjorn Helgaas
Post by Rajat Jain
With my patch, we'd eliminate the 1 second delay and the second line
of output above. If we also want to get rid of the first line
"Unexpected CMD_COMPLETED..." also, that would probably have to be
solved by changing the behavior to assume that CC shall be generated
for every SLOTCTRL register write.
I *do* want to get rid of the "Unexpected CMD_COMPLETED" complaint.
We shouldn't complain about hardware that is working perfectly fine.
I don't know the best way to do that yet. I have found a box with the
same hardware that was fixed by 5808639bfa98, so I hope to play with
it and figure out something that will work nicely for both scenarios.
Please keep posted :-)
Post by Bjorn Helgaas
BTW, can you open a report at bugzilla.kernel.org and attach your
"lspci -vvxxx" and dmesg output? When we eventually merge a fix, I'd
like to have the details archived somewhere.
https://bugzilla.kernel.org/show_bug.cgi?id=64821
Please let me know if you need any help in trying something out - I'd
be more than keen to help - write code or test.
What do you think of the patch below? I'm afraid we'll trip over
a few other old parts similar to the 82801H, but I'd rather do that
than penalize the parts that work correctly.
Works nicely as far as I can see. Tested on P2020 and P5040 based systems
with IDT 89HPES48H12G2.

Tested-by: Guenter Roeck <***@juniper.net>

Guenter
Post by Bjorn Helgaas
PCI: pciehp: Support command completion even with no hotplug hardware
Commit 5808639bfa98 ("pciehp: fix slow probing") fixed a slow probing
problem on hardware that doesn't conform to the spec, but caused a
similar problem for hardware that *does* conform to the spec.
Per PCIe 3.0, sec 7.8.10 and 6.7.3.2, any write to Slot Control generates a
hot-plug command. Ports that can accept new commands with no delay can set
the "No Command Completed Support" bit. Otherwise the port must indicate
its completion of the command and readiness to accept a new command with a
"command completed event."
5808639bfa98 assumes ports that lack a power controller, power indicator,
attention indicator, and interlock will not generate completion events,
even if they neglect to set "No Command Completed Support." But on ports
that lack those elements and *do* support command completion notification,
Unexpected CMD_COMPLETED. Need to wait for command completed event.
Command not completed in 1000 msec
and forces us to wait for a 1 second timeout.
This patch makes the 5808639bfa98 workaround into a quirk that's applied
only to devices known to be broken, currently just Intel 82801H ports.
There are probably other similarly-broken devices that may now probe
slowly, but I don't know how to catch them all without penalizing the
ones that play by the rules.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=64821
---
drivers/pci/hotplug/pciehp_hpc.c | 39 +++++++++++++++++++++++++++++---------
1 file changed, 30 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 3eea3fdd4b0b..2fd2bd59e07f 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -881,6 +881,34 @@ static inline void dbg_ctrl(struct controller *ctrl)
ctrl_info(ctrl, "Slot Control : 0x%04x\n", reg16);
}
+static int pciehp_no_command_complete(struct controller *ctrl)
+{
+ struct pcie_device *dev = ctrl->pcie;
+ struct pci_dev *pdev = dev->port;
+ u16 vendor, device;
+
+ if (NO_CMD_CMPL(ctrl))
+ return 1;
+
+ /*
+ * Controller should notify on command completion unless the "No
+ * Command Completed Support" bit is set. But some hardware does
+ * not. See https://bugzilla.kernel.org/show_bug.cgi?id=10751
+ */
+ if (!(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl))) {
+ vendor = pdev->vendor;
+ device = pdev->device;
+ if (vendor == PCI_VENDOR_ID_INTEL &&
+ device >= 0x283f && device <= 0x2849) {
+ dev_info(&dev->device, "device [%04x:%04x] does not notify on hotplug command completion\n",
+ vendor, device);
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
struct controller *pcie_init(struct pcie_device *dev)
{
struct controller *ctrl;
@@ -902,15 +930,8 @@ struct controller *pcie_init(struct pcie_device *dev)
mutex_init(&ctrl->ctrl_lock);
init_waitqueue_head(&ctrl->queue);
dbg_ctrl(ctrl);
- /*
- * Controller doesn't notify of command completion if the "No
- * Command Completed Support" bit is set in Slot Capability
- * register or the controller supports none of power
- * controller, attention led, power led and EMI.
- */
- if (NO_CMD_CMPL(ctrl) ||
- !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) | EMI(ctrl)))
- ctrl->no_cmd_complete = 1;
+
+ ctrl->no_cmd_complete = pciehp_no_command_complete(ctrl);
/* Check if Data Link Layer Link Active Reporting is implemented */
if (pciehp_readl(ctrl, PCI_EXP_LNKCAP, &link_cap)) {
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Rajat Jain
2013-11-23 14:56:22 UTC
Permalink
Hello Bjorn,
Post by Rajat Jain
https://bugzilla.kernel.org/show_bug.cgi?id=64821
Please let me know if you need any help in trying something out - I'd
be more than keen to help - write code or test.
What do you think of the patch below? I'm afraid we'll trip over a few
other old parts similar to the 82801H, but I'd rather do that than
penalize the parts that work correctly.
Tested and works fine. Looks good (on a side note, we are introducing device specific check in a common code, but I do not think there is any other cleaner way). I think this patch should be applied as it solves the problem of slow probing.

On a different note, I feel there is still a need to apply my original patch. There is still an open problem in case of spurious interrupts (or in any case where the condition "if (slot_status & PCI_EXP_SLTSTA_CC)" becomes true in pcie_write_cmd()). That is because once that happens, we never clear that interrupt, and no further hotplug interrupts shall be received unless we do that.

Thanks,

Rajat
PCI: pciehp: Support command completion even with no hotplug hardware
Commit 5808639bfa98 ("pciehp: fix slow probing") fixed a slow probing
problem on hardware that doesn't conform to the spec, but caused a
similar problem for hardware that *does* conform to the spec.
Per PCIe 3.0, sec 7.8.10 and 6.7.3.2, any write to Slot Control
generates a hot-plug command. Ports that can accept new commands with
no delay can set the "No Command Completed Support" bit. Otherwise the
port must indicate its completion of the command and readiness to accept
a new command with a "command completed event."
5808639bfa98 assumes ports that lack a power controller, power
indicator, attention indicator, and interlock will not generate
completion events, even if they neglect to set "No Command Completed
Support." But on ports that lack those elements and *do* support command
Unexpected CMD_COMPLETED. Need to wait for command completed event.
Command not completed in 1000 msec
and forces us to wait for a 1 second timeout.
This patch makes the 5808639bfa98 workaround into a quirk that's applied
only to devices known to be broken, currently just Intel 82801H ports.
There are probably other similarly-broken devices that may now probe
slowly, but I don't know how to catch them all without penalizing the
ones that play by the rules.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=64821
---
drivers/pci/hotplug/pciehp_hpc.c | 39 +++++++++++++++++++++++++++++--
-------
1 file changed, 30 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c
b/drivers/pci/hotplug/pciehp_hpc.c
index 3eea3fdd4b0b..2fd2bd59e07f 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -881,6 +881,34 @@ static inline void dbg_ctrl(struct controller *ctrl)
ctrl_info(ctrl, "Slot Control : 0x%04x\n", reg16);
}
+static int pciehp_no_command_complete(struct controller *ctrl) {
+ struct pcie_device *dev = ctrl->pcie;
+ struct pci_dev *pdev = dev->port;
+ u16 vendor, device;
+
+ if (NO_CMD_CMPL(ctrl))
+ return 1;
+
+ /*
+ * Controller should notify on command completion unless the "No
+ * Command Completed Support" bit is set. But some hardware does
+ * not. See https://bugzilla.kernel.org/show_bug.cgi?id=10751
+ */
+ if (!(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) |
EMI(ctrl))) {
+ vendor = pdev->vendor;
+ device = pdev->device;
+ if (vendor == PCI_VENDOR_ID_INTEL &&
+ device >= 0x283f && device <= 0x2849) {
+ dev_info(&dev->device, "device [%04x:%04x] does not
notify on hotplug command completion\n",
+ vendor, device);
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
struct controller *pcie_init(struct pcie_device *dev) {
struct controller *ctrl;
@@ -902,15 +930,8 @@ struct controller *pcie_init(struct pcie_device *dev)
mutex_init(&ctrl->ctrl_lock);
init_waitqueue_head(&ctrl->queue);
dbg_ctrl(ctrl);
- /*
- * Controller doesn't notify of command completion if the "No
- * Command Completed Support" bit is set in Slot Capability
- * register or the controller supports none of power
- * controller, attention led, power led and EMI.
- */
- if (NO_CMD_CMPL(ctrl) ||
- !(POWER_CTRL(ctrl) | ATTN_LED(ctrl) | PWR_LED(ctrl) |
EMI(ctrl)))
- ctrl->no_cmd_complete = 1;
+
+ ctrl->no_cmd_complete = pciehp_no_command_complete(ctrl);
/* Check if Data Link Layer Link Active Reporting is
implemented */
if (pciehp_readl(ctrl, PCI_EXP_LNKCAP, &link_cap)) {
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas
2013-11-23 19:32:55 UTC
Permalink
Post by Rajat Jain
Hello Bjorn,
Post by Rajat Jain
https://bugzilla.kernel.org/show_bug.cgi?id=64821
Please let me know if you need any help in trying something out - I'd
be more than keen to help - write code or test.
What do you think of the patch below? I'm afraid we'll trip over a few
other old parts similar to the 82801H, but I'd rather do that than
penalize the parts that work correctly.
Tested and works fine. Looks good (on a side note, we are introducing device specific check in a common code, but I do not think there is any other cleaner way). I think this patch should be applied as it solves the problem of slow probing.
Thanks for testing. I don't like the device-specific code either, but
it does seem like a device-specific defect, and I didn't have any
better ideas.
Post by Rajat Jain
On a different note, I feel there is still a need to apply my original patch. There is still an open problem in case of spurious interrupts (or in any case where the condition "if (slot_status & PCI_EXP_SLTSTA_CC)" becomes true in pcie_write_cmd()). That is because once that happens, we never clear that interrupt, and no further hotplug interrupts shall be received unless we do that.
I agree this is an issue and we should address it somehow. My
hesitation is just that I'd prefer to do some more aggressive
restructuring rather than apply a point fix. For example:

- We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),
pcie_poll_cmd(), and pcie_write_cmd(). I think it would be better to
look at it only in pcie_isr().

- I don't think pcie_poll_cmd() should exist at all; we should poll by
calling pcie_isr() instead.

- We need pcie_write_cmd(), but I think the way it waits is backwards.
Currently we issue the command, then wait for it to complete. I
think we should issue the command, note the current time, and return
without waiting. The *next* time we need to issue a command, we can
wait for completion of the previous one (or timeout) if necessary.

But maybe we need the point fix in the interim, especially if anybody
can actually produce the scenario you mention.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Rajat Jain
2013-11-25 19:03:11 UTC
Permalink
Hello,
Post by Rajat Jain
Post by Rajat Jain
On a different note, I feel there is still a need to apply my original
patch. There is still an open problem in case of spurious interrupts (or
in any case where the condition "if (slot_status & PCI_EXP_SLTSTA_CC)"
becomes true in pcie_write_cmd()). That is because once that happens, we
never clear that interrupt, and no further hotplug interrupts shall be
received unless we do that.
I agree this is an issue and we should address it somehow. My
hesitation is just that I'd prefer to do some more aggressive
OK, I'll attempt to fix it that way when I get time.
Post by Rajat Jain
- We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(), pcie_poll_cmd(),
and pcie_write_cmd(). I think it would be better to look at it only in
pcie_isr().
- I don't think pcie_poll_cmd() should exist at all; we should poll by
calling pcie_isr() instead.
- We need pcie_write_cmd(), but I think the way it waits is backwards.
Currently we issue the command, then wait for it to complete. I think
we should issue the command, note the current time, and return without
waiting. The *next* time we need to issue a command, we can wait for
completion of the previous one (or timeout) if necessary.
But maybe we need the point fix in the interim, especially if anybody
can actually produce the scenario you mention.
Ok.

Thanks,

Rajat

--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas
2014-02-12 00:34:58 UTC
Permalink
Post by Rajat Jain
Hello,
Post by Rajat Jain
Post by Rajat Jain
On a different note, I feel there is still a need to apply my original
patch. There is still an open problem in case of spurious interrupts (or
in any case where the condition "if (slot_status & PCI_EXP_SLTSTA_CC)"
becomes true in pcie_write_cmd()). That is because once that happens, we
never clear that interrupt, and no further hotplug interrupts shall be
received unless we do that.
I agree this is an issue and we should address it somehow. My
hesitation is just that I'd prefer to do some more aggressive
OK, I'll attempt to fix it that way when I get time.
Post by Rajat Jain
- We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(), pcie_poll_cmd(),
and pcie_write_cmd(). I think it would be better to look at it only in
pcie_isr().
- I don't think pcie_poll_cmd() should exist at all; we should poll by
calling pcie_isr() instead.
- We need pcie_write_cmd(), but I think the way it waits is backwards.
Currently we issue the command, then wait for it to complete. I think
we should issue the command, note the current time, and return without
waiting. The *next* time we need to issue a command, we can wait for
completion of the previous one (or timeout) if necessary.
But maybe we need the point fix in the interim, especially if anybody
can actually produce the scenario you mention.
Ok.
This patch is still in patchwork, but I've lost track of where we are.
Did you resolve this in the series that I just applied, or is it still
an outstanding issue?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Rajat Jain
2014-02-12 01:08:50 UTC
Permalink
Hello Bjorn,
-----Original Message-----
Sent: Tuesday, February 11, 2014 4:35 PM
To: Rajat Jain
Kaneshige; Yijing Wang; Greg KH; Tom Nguyen; Kristen Accardi; Rajat
Jain; Guenter Roeck
Subject: Re: [PATCH] pciehp: Acknowledge the spurious "cmd completed"
event.
Post by Rajat Jain
Hello,
Post by Rajat Jain
Post by Rajat Jain
On a different note, I feel there is still a need to apply my original
patch. There is still an open problem in case of spurious interrupts
(or in any case where the condition "if (slot_status &
PCI_EXP_SLTSTA_CC)"
Post by Rajat Jain
Post by Rajat Jain
becomes true in pcie_write_cmd()). That is because once that
happens, we never clear that interrupt, and no further hotplug
interrupts shall be received unless we do that.
I agree this is an issue and we should address it somehow. My
hesitation is just that I'd prefer to do some more aggressive
OK, I'll attempt to fix it that way when I get time.
Post by Rajat Jain
- We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),
pcie_poll_cmd(), and pcie_write_cmd(). I think it would be better
to look at it only in pcie_isr().
- I don't think pcie_poll_cmd() should exist at all; we should poll
by calling pcie_isr() instead.
- We need pcie_write_cmd(), but I think the way it waits is
backwards.
Post by Rajat Jain
Post by Rajat Jain
Currently we issue the command, then wait for it to complete. I
think we should issue the command, note the current time, and return
without waiting. The *next* time we need to issue a command, we can
wait for completion of the previous one (or timeout) if necessary.
But maybe we need the point fix in the interim, especially if
anybody can actually produce the scenario you mention.
Ok.
This patch is still in patchwork, but I've lost track of where we are.
Did you resolve this in the series that I just applied, or is it still
an outstanding issue?
No, I did not solve it. It is still an outstanding issue. So far I am using your patch to overcome this:

http://www.spinics.net/lists/hotplug/msg05830.html

I'll just attempt to conclude the status on this issue so that you can make the decision on the course of action. IMHO there are 2 independent issues that we discussed in this thread:

1) PCIe compliant HW (that generates cmd completed interrupts at every write of Slot_ctrl register) being penalized with 1 second delay during the boot up. Your patch solves this.

2) If there is a genuine spurious interrupt, it does not get acknowledged. I had originally posted a patch for THIS problem.
http://www.spinics.net/lists/hotplug/msg05815.html

You had indicated that you would rather want a bigger restructuring of the driver to solve (2).

My observation: MY problem (in my setup) is not seen if I use either of the patches (yours or mine).

My opinion: I think my patch solves (2) but might not solve (1) for all corner cases. Also your patch solves (1) but may not solve (2) for all corner cases -Thus we should probably solve both of these problems individually.

Thanks,

Rajat
Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug"
info at http://vger.kernel.org/majordomo-info.html
��칻�&�~�&���+-��ݶ��w��˛���m�b��a��e�����ܨ}���Ơz�&j:+v�������zZ+
Rajat Jain
2014-02-20 07:42:05 UTC
Permalink
Hello Bjorn,
Post by Rajat Jain
Post by Rajat Jain
Post by Rajat Jain
Hello,
Post by Rajat Jain
Post by Rajat Jain
On a different note, I feel there is still a need to apply my original
patch. There is still an open problem in case of spurious interrupts
(or in any case where the condition "if (slot_status &
PCI_EXP_SLTSTA_CC)"
Post by Rajat Jain
Post by Rajat Jain
becomes true in pcie_write_cmd()). That is because once that
happens, we never clear that interrupt, and no further hotplug
interrupts shall be received unless we do that.
I agree this is an issue and we should address it somehow. My
hesitation is just that I'd prefer to do some more aggressive
OK, I'll attempt to fix it that way when I get time.
Post by Rajat Jain
- We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),
pcie_poll_cmd(), and pcie_write_cmd(). I think it would be better
to look at it only in pcie_isr().
- I don't think pcie_poll_cmd() should exist at all; we should poll
by calling pcie_isr() instead.
- We need pcie_write_cmd(), but I think the way it waits is
backwards.
Post by Rajat Jain
Post by Rajat Jain
Currently we issue the command, then wait for it to complete. I
think we should issue the command, note the current time, and return
without waiting. The *next* time we need to issue a command, we can
wait for completion of the previous one (or timeout) if necessary.
But maybe we need the point fix in the interim, especially if
anybody can actually produce the scenario you mention.
Ok.
This patch is still in patchwork, but I've lost track of where we are.
Did you resolve this in the series that I just applied, or is it still
an outstanding issue?
http://www.spinics.net/lists/hotplug/msg05830.html
1) PCIe compliant HW (that generates cmd completed interrupts at every write of Slot_ctrl register) being penalized with 1 second delay during the boot up. Your patch solves this.
2) If there is a genuine spurious interrupt, it does not get acknowledged. I had originally posted a patch for THIS problem.
http://www.spinics.net/lists/hotplug/msg05815.html
You had indicated that you would rather want a bigger restructuring of the driver to solve (2).
My observation: MY problem (in my setup) is not seen if I use either of the patches (yours or mine).
My opinion: I think my patch solves (2) but might not solve (1) for all corner cases. Also your patch solves (1) but may not solve (2) for all corner cases -Thus we should probably solve both of these problems individually.
Just wondering if you decided on how to solve this problem?

Are you planning this for 3.15?

Thanks,

Rajat
Bjorn Helgaas
2014-02-20 22:20:40 UTC
Permalink
Post by Rajat Jain
Hello Bjorn,
Post by Rajat Jain
Post by Rajat Jain
Post by Rajat Jain
Hello,
Post by Rajat Jain
Post by Rajat Jain
On a different note, I feel there is still a need to apply my original
patch. There is still an open problem in case of spurious interrupts
(or in any case where the condition "if (slot_status &
PCI_EXP_SLTSTA_CC)"
Post by Rajat Jain
Post by Rajat Jain
becomes true in pcie_write_cmd()). That is because once that
happens, we never clear that interrupt, and no further hotplug
interrupts shall be received unless we do that.
I agree this is an issue and we should address it somehow. My
hesitation is just that I'd prefer to do some more aggressive
OK, I'll attempt to fix it that way when I get time.
Post by Rajat Jain
- We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),
pcie_poll_cmd(), and pcie_write_cmd(). I think it would be better
to look at it only in pcie_isr().
- I don't think pcie_poll_cmd() should exist at all; we should poll
by calling pcie_isr() instead.
- We need pcie_write_cmd(), but I think the way it waits is
backwards.
Post by Rajat Jain
Post by Rajat Jain
Currently we issue the command, then wait for it to complete. I
think we should issue the command, note the current time, and return
without waiting. The *next* time we need to issue a command, we can
wait for completion of the previous one (or timeout) if necessary.
But maybe we need the point fix in the interim, especially if
anybody can actually produce the scenario you mention.
Ok.
This patch is still in patchwork, but I've lost track of where we are.
Did you resolve this in the series that I just applied, or is it still
an outstanding issue?
http://www.spinics.net/lists/hotplug/msg05830.html
1) PCIe compliant HW (that generates cmd completed interrupts at every write of Slot_ctrl register) being penalized with 1 second delay during the boot up. Your patch solves this.
2) If there is a genuine spurious interrupt, it does not get acknowledged. I had originally posted a patch for THIS problem.
http://www.spinics.net/lists/hotplug/msg05815.html
You had indicated that you would rather want a bigger restructuring of the driver to solve (2).
My observation: MY problem (in my setup) is not seen if I use either of the patches (yours or mine).
My opinion: I think my patch solves (2) but might not solve (1) for all corner cases. Also your patch solves (1) but may not solve (2) for all corner cases -Thus we should probably solve both of these problems individually.
Just wondering if you decided on how to solve this problem?
Are you planning this for 3.15?
Sorry, I haven't had a chance to work on this, so I don't think *I*
will get anything done for v3.15. To make forward progress, maybe we
should merge your original patch? Would you mind posting it again so
it gets into patchwork again?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Rajat Jain
2014-02-21 01:43:40 UTC
Permalink
Post by Bjorn Helgaas
Post by Rajat Jain
Hello Bjorn,
Post by Rajat Jain
Post by Rajat Jain
Post by Rajat Jain
Hello,
Post by Rajat Jain
Post by Rajat Jain
On a different note, I feel there is still a need to apply my original
patch. There is still an open problem in case of spurious interrupts
(or in any case where the condition "if (slot_status &
PCI_EXP_SLTSTA_CC)"
Post by Rajat Jain
Post by Rajat Jain
becomes true in pcie_write_cmd()). That is because once that
happens, we never clear that interrupt, and no further hotplug
interrupts shall be received unless we do that.
I agree this is an issue and we should address it somehow. My
hesitation is just that I'd prefer to do some more aggressive
OK, I'll attempt to fix it that way when I get time.
Post by Rajat Jain
- We currently look at PCI_EXP_SLTSTA_CC in pcie_isr(),
pcie_poll_cmd(), and pcie_write_cmd(). I think it would be better
to look at it only in pcie_isr().
- I don't think pcie_poll_cmd() should exist at all; we should poll
by calling pcie_isr() instead.
- We need pcie_write_cmd(), but I think the way it waits is
backwards.
Post by Rajat Jain
Post by Rajat Jain
Currently we issue the command, then wait for it to complete. I
think we should issue the command, note the current time, and return
without waiting. The *next* time we need to issue a command, we can
wait for completion of the previous one (or timeout) if necessary.
But maybe we need the point fix in the interim, especially if
anybody can actually produce the scenario you mention.
Ok.
This patch is still in patchwork, but I've lost track of where we are.
Did you resolve this in the series that I just applied, or is it still
an outstanding issue?
http://www.spinics.net/lists/hotplug/msg05830.html
1) PCIe compliant HW (that generates cmd completed interrupts at every write of Slot_ctrl register) being penalized with 1 second delay during the boot up. Your patch solves this.
2) If there is a genuine spurious interrupt, it does not get acknowledged. I had originally posted a patch for THIS problem.
http://www.spinics.net/lists/hotplug/msg05815.html
You had indicated that you would rather want a bigger restructuring of the driver to solve (2).
My observation: MY problem (in my setup) is not seen if I use either of the patches (yours or mine).
My opinion: I think my patch solves (2) but might not solve (1) for all corner cases. Also your patch solves (1) but may not solve (2) for all corner cases -Thus we should probably solve both of these problems individually.
Just wondering if you decided on how to solve this problem?
Are you planning this for 3.15?
Sorry, I haven't had a chance to work on this, so I don't think *I*
will get anything done for v3.15. To make forward progress, maybe we
should merge your original patch? Would you mind posting it again so
it gets into patchwork again?
Just sent it again, Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-hotplug" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...