Skip to content

PCI: dwc/qcom: Rework dw_pcie_wait_for_link() error handling and add D3cold support#644

Open
ziyuezhang-123 wants to merge 10 commits into
qualcomm-linux:qcom-6.18.yfrom
ziyuezhang-123:d3_cold_2
Open

PCI: dwc/qcom: Rework dw_pcie_wait_for_link() error handling and add D3cold support#644
ziyuezhang-123 wants to merge 10 commits into
qualcomm-linux:qcom-6.18.yfrom
ziyuezhang-123:d3_cold_2

Conversation

@ziyuezhang-123
Copy link
Copy Markdown

@ziyuezhang-123 ziyuezhang-123 commented Jun 2, 2026

Port two upstream patch series to d3_cold_2:

  1. Rework dw_pcie_wait_for_link() error handling (v4,
    https://lore.kernel.org/all/20260120-pci-dwc-suspend-rework-v4-0-2f32d5082549@oss.qualcomm.com/):
    distinguish -ENODEV/-EIO/-ETIMEDOUT for different link failure scenarios,
    improve error logging with LTSSM state, and only fail dw_pcie_host_init()
    on -ETIMEDOUT.

  2. Add Qualcomm PCIe D3cold support (v5,
    https://lore.kernel.org/all/20260429-d3cold-v5-0-89e9735b9df6@oss.qualcomm.com/):
    add pci_host_common_d3cold_possible() eligibility helper, safe LTSSM
    read via PARF register, PHY power-down in deinit path, and full
    interconnect/OPP teardown on D3cold entry.

Two fixes applied during porting: correct ops->get_ltssm guard condition
in qcom_pcie_get_ltssm() (upstream rebase bug), and remove unused
err_deinit label in dw_pcie_resume_noirq() (fixes -Wunused-label warning).

CRs-Fixed: 4551808

Mani-Sadhasivam and others added 9 commits June 2, 2026 16:00
…vice is not found

The dw_pcie_wait_for_link() function waits up to 1 second for the PCIe link
to come up and returns -ETIMEDOUT for all failures without distinguishing
cases where no device is present on the bus. But the callers may want to
just skip the failure if the device is not found on the bus and handle
failure for other reasons.

So after timeout, if the LTSSM is in Detect.Quiet or Detect.Active state,
return -ENODEV to indicate the callers that the device is not found on the
bus and return -ETIMEDOUT otherwise.

Also add kernel doc to document the parameter and return values.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Tested-by: Richard Zhu <hongxing.zhu@nxp.com>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260120-pci-dwc-suspend-rework-v4-1-2f32d5082549@oss.qualcomm.com
Signed-off-by: Ziyue Zhang <ziyue.zhang@oss.qualcomm.com>
…e is not active

There are cases where the PCIe device would be physically connected to the
bus, but the device firmware might not be active. So the LTSSM will
get stuck in POLL.{Active/Compliance} states.

This behavior is common with endpoint devices controlled by the PCI
Endpoint framework, where the device will wait for the user to start its
operation through configfs.

For those cases, print the relevant log and return -EIO to indicate that
the device is present, but not active. This will allow the callers to skip
the failure as the device might become active in the future.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260120-pci-dwc-suspend-rework-v4-2-2f32d5082549@oss.qualcomm.com
Signed-off-by: Ziyue Zhang <ziyue.zhang@oss.qualcomm.com>
…ignware.c

Rename ltssm_status_string() to dw_pcie_ltssm_status_string() and move it
to the common file pcie-designware.c so that this function could be used
outside of pcie-designware-debugfs.c file.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Tested-by: Richard Zhu <hongxing.zhu@nxp.com>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260120-pci-dwc-suspend-rework-v4-3-2f32d5082549@oss.qualcomm.com
Signed-off-by: Ziyue Zhang <ziyue.zhang@oss.qualcomm.com>
For the cases where the link cannot come up later i.e., when LTSSM is not
in Detect.{Quiet/Active} or Poll.{Active/Compliance} states,
dw_pcie_wait_for_link() should log an error.

So promote dev_info() to dev_err(), reword the error log to make it clear
and also print the LTSSM state to aid debugging.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Tested-by: Richard Zhu <hongxing.zhu@nxp.com>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260120-pci-dwc-suspend-rework-v4-4-2f32d5082549@oss.qualcomm.com
Signed-off-by: Ziyue Zhang <ziyue.zhang@oss.qualcomm.com>
…() returns -ETIMEDOUT

The dw_pcie_wait_for_link() API now distinguishes link failures more
precisely:

-ENODEV: Device not found on the bus.
-EIO: Device found but inactive.
-ETIMEDOUT: Link failed to come up.

Out of these three errors, only -ETIMEDOUT represents a definitive link
failure since it signals that something is wrong with the link. For the
other two errors, there is a possibility that the link might come up later.
So fail dw_pcie_host_init() if -ETIMEDOUT is returned and skip the failure
otherwise.

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Link: https://patch.msgid.link/20260120-pci-dwc-suspend-rework-v4-5-2f32d5082549@oss.qualcomm.com
Signed-off-by: Ziyue Zhang <ziyue.zhang@oss.qualcomm.com>
…d eligibility

Add a common helper, pci_host_common_d3cold_possible(), to determine
whether PCIe devices under host bridge can safely transition to D3cold.

This helper is intended to be used by PCI host controller drivers to
decide whether they may safely put the host bridge into D3cold based on
the power state and wakeup capabilities of downstream endpoints.

The helper walks all devices on the all bridge buses and only allows
the devices to enter D3cold if all PCIe endpoints are already in
PCI_D3hot. This ensures that we do not power off the host bridge while
any active endpoint still requires the link to remain powered.

For devices that may wake the system, the helper additionally requires
that the device supports PME wake from D3cold (via WAKE#). Devices that
do not have wakeup enabled are not restricted by this check and do not
block the devices under host bridge from entering D3cold.

Devices without a bound driver and with PCI not enabled via sysfs are
treated as inactive and therefore do not prevent the devices under host
bridge from entering D3cold. This allows controllers to power down more
aggressively when there are no actively managed endpoints.

Some devices (e.g. M.2 without auxiliary power) lose PME detection when
main power is removed. Even if such devices advertise PME-from-D3cold
capability, entering D3cold may break wakeup. So, return PME-from-D3cold
capability via an output parameter so PCIe controller drivers can apply
platform-specific handling to preserve wakeup functionality.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260429-d3cold-v5-1-89e9735b9df6@oss.qualcomm.com/
Signed-off-by: Ziyue Zhang <ziyue.zhang@oss.qualcomm.com>
For older targets like sc7280, we see reading DBI after sending PME
turn off message is causing NOC error.

To avoid unsafe DBI accesses, introduce qcom_pcie_get_ltssm() to retrieve
the LTSSM state. For newer platforms, the LTSSM state is read from the
PARF_LTSSM register, while older platforms continue to retrieve it from
ELBI_SYS_STTS.

This helper is used in place of direct DBI-based link state checks in
the D3cold path after sending PME turn-off message, ensuring the LTSSM
state can be queried safely even after DBI access is no longer valid.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260429-d3cold-v5-2-89e9735b9df6@oss.qualcomm.com/
Signed-off-by: Ziyue Zhang <ziyue.zhang@oss.qualcomm.com>
…g rails/clocks

Some Qcom PCIe controller variants bring the PHY out of test power-down
(PHY_TEST_PWR_DOWN) during init. When the link is later transitioned
towards D3cold and the driver disables PCIe clocks and/or regulators
without explicitly re-asserting PHY_TEST_PWR_DOWN, the PHY can remain
partially powered, leading to avoidable power leakage.

Update the init-path comments to reflect that PARF_PHY_CTRL is used to
power the PHY on. Also, for controller revisions that enable PHY power
in init (2.3.2, 2.3.3, 2.4.0, 2.7.0 and 2.9.0), explicitly power the PHY
down via PARF_PHY_CTRL in the deinit path before disabling clocks or
regulators.

This ensures the PHY is put into a defined low-power state prior to
removing its supplies, preventing leakage when entering D3cold.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260429-d3cold-v5-3-89e9735b9df6@oss.qualcomm.com/
Signed-off-by: Ziyue Zhang <ziyue.zhang@oss.qualcomm.com>
Previously, the driver skipped putting the link into L2/device state in
D3cold whenever L1 ASPM was enabled, since some devices (e.g. NVMe) expect
low resume latency and may not tolerate deeper power states. However, such
devices typically remain in D0 and are already covered by the new helper's
requirement that all endpoints be in D3hot before the devices under host
bridge may enter D3cold.

So, replace the local L1/L1SS-based check in dw_pcie_suspend_noirq() with
the shared pci_host_common_d3cold_possible() helper to decide whether the
devices under host bridge can safely transition to D3cold.

In addition, propagate PME-from-D3cold capability information from the
helper and record it in skip_pwrctrl_off. Some devices (e.g. M.2 cards
without auxiliary power) may lose PME detection when main power is
removed, even if they advertise PME-from-D3cold support. This allows
controller power-off to be skipped when required to preserve wakeup
functionality.

Update the suspended flag in dw_pcie_resume_noirq() only after the PCIe
link resumes successfully, to avoid marking the controller active when
link resume fails.

Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Link: https://lore.kernel.org/all/20260429-d3cold-v5-4-89e9735b9df6@oss.qualcomm.com/
Signed-off-by: Ziyue Zhang <ziyue.zhang@oss.qualcomm.com>
@ziyuezhang-123 ziyuezhang-123 requested review from a team, aiquny, idlethread and quic-kaushalk June 2, 2026 08:26
@qswat-orbit-external
Copy link
Copy Markdown

Merge Check Failed: No CR Numbers Found

Error: No Change Request numbers were found.

Please add Change Request numbers to your pull request description in the format CRs-Fixed: 12345 or link GitHub issues that are associated with Change Requests.

@qswat-orbit-external
Copy link
Copy Markdown

Merge Check Failed: CR Not Eligible for Merge

CR 4551808 is not eligible for merge.

The parent software image for kernel.qli.2.0 is not development complete.

Entity: kernel.qli.2.0
CR: 4551808
Reason: CR_CANNOT_MERGE

Please ensure the CR passes both CCT (ComponentChangeTasks) and ICT (Integration Change Tasks) validations.

@ziyuezhang-123
Copy link
Copy Markdown
Author

Merge Check Failed: CR Not Eligible for Merge

CR 4551808 is not eligible for merge.

The parent software image for kernel.qli.2.0 is not development complete.

Entity: kernel.qli.2.0 CR: 4551808 Reason: CR_CANNOT_MERGE

Please ensure the CR passes both CCT (ComponentChangeTasks) and ICT (Integration Change Tasks) validations.

dev complete now

Add support for transitioning PCIe endpoints under host bridge into
D3cold by integrating with the DWC core suspend/resume helpers.

Implement PME_TurnOff message generation via ELBI_SYS_CTRL and hook it
into the DWC host operations so the controller follows the standard
PME_TurnOff-based power-down sequence before entering D3cold.

When the device is suspended into D3cold, fully tear down interconnect
bandwidth, OPP votes. If D3cold is not entered, retain existing behavior
by keeping the required interconnect and OPP votes.

Use dw_pcie::skip_pwrctrl_off to avoid powering off devices during suspend
to preserve wakeup capability of the devices and also not to power on the
devices in the init path.

Drop the qcom_pcie::suspended flag and rely on the existing
dw_pcie::suspended state, which now drives both the power-management
flow and the interconnect/OPP handling.

Link: https://lore.kernel.org/all/20260429-d3cold-v5-5-89e9735b9df6@oss.qualcomm.com/
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Signed-off-by: Ziyue Zhang <ziyue.zhang@oss.qualcomm.com>
@qswat-orbit-external
Copy link
Copy Markdown

Merge Check Failed: CR Not Eligible for Merge

CR 4551808 is not eligible for merge.

The parent software image for kernel.qli.2.0 is not development complete.

Entity: kernel.qli.2.0
CR: 4551808
Reason: CR_CANNOT_MERGE

Please ensure the CR passes both CCT (ComponentChangeTasks) and ICT (Integration Change Tasks) validations.

@qlijarvis
Copy link
Copy Markdown

PR #644 — validate-patch

PR: #644

Verdict Issues Detailed Report
⚠️ 10 Full report

Final Summary

  1. Lore link present: Yes - all 10 commits have proper lore.kernel.org or patch.msgid.link URLs
  2. Lore link matches PR commits: Cannot verify - network restrictions prevent fetching upstream patches for comparison
  3. Upstream patch status: Cannot verify - lore.kernel.org unreachable; however, 9/10 commits already in qcom-next suggests upstream acceptance
  4. PR present in qcom-next: Partial - 9/10 commits found (patches 1-7, 9-10); patch 8 not found
Verdict: ⚠️ — click to expand

🔍 Patch Validation

PR: #644 - PCI: dwc/qcom D3cold support series (10 commits)
Upstream commits: Two patch series from lore.kernel.org
Verdict: ⚠️ PARTIAL

Commit Message

Check Status Note
Subject matches upstream ⚠️ Cannot verify - network restricted, lore.kernel.org unreachable
Body preserves rationale ⚠️ Cannot verify - unable to fetch upstream patches
Fixes tag present/correct N/A No Fixes tags in these commits
Authorship preserved FROMLIST: original authors in Signed-off-by, submitter in From:
Backport note (if applicable) N/A FROMLIST commits, not backports

Diff

File Status Notes
All files ⚠️ Cannot compare - lore.kernel.org unreachable due to network restrictions

Issues

Network Limitation:

  • Cannot fetch upstream patches from lore.kernel.org or patch.msgid.link due to network restrictions
  • Unable to perform detailed diff comparison between PR and upstream source
  • Unable to verify upstream acceptance status via lore thread analysis

Authorship - Correct for FROMLIST:

  • All 10 commits properly use FROMLIST: prefix
  • Original authors (Manivannan Sadhasivam for patches 1-5, Krishna Chaitanya Chundru for patches 6-10) present in Signed-off-by
  • Submitter (Ziyue Zhang) correctly in From: field and adds own Signed-off-by
  • This is the expected and correct pattern for FROMLIST commits

qcom-next Presence:

  • 9 out of 10 commits found in qcom-next branch
  • Patch 8 ("PCI: qcom: Power down PHY via PARF_PHY_CTRL") not found in qcom-next
  • All other patches already integrated

Verdict

Cannot fully validate due to network restrictions preventing lore.kernel.org access. However, based on available evidence:

Positive indicators:

  • All commits have proper FROMLIST: prefix and lore links
  • Authorship handling is correct for FROMLIST commits
  • 9/10 commits already present in qcom-next
  • Commit messages appear well-formed with proper tags

Concerns:

  • Patch 8 not found in qcom-next (may be newer or squashed into another commit)
  • Cannot verify diff faithfulness to upstream without lore access
  • Cannot confirm upstream acceptance status

Recommendation: Request manual verification of patch 8's qcom-next status. If patch 8 is confirmed to be in qcom-next (possibly under a different commit message), or if it's a newer addition not yet synced, the PR appears acceptable for merge based on proper FROMLIST formatting and authorship handling.

Final Summary

  1. Lore link present: Yes - all 10 commits have proper lore.kernel.org or patch.msgid.link URLs
  2. Lore link matches PR commits: Cannot verify - network restrictions prevent fetching upstream patches for comparison
  3. Upstream patch status: Cannot verify - lore.kernel.org unreachable; however, 9/10 commits already in qcom-next suggests upstream acceptance
  4. PR present in qcom-next: Partial - 9/10 commits found (patches 1-7, 9-10); patch 8 not found

@qlijarvis
Copy link
Copy Markdown

PR #644 — checker-log-analyzer

PR: #644
Checker run: https://github.com/qualcomm-linux/kernel-config/actions/runs/26808637299

Checker Result Summary
Checker Result Summary
checkpatch All 10 commits pass
dt-binding-check ⏭️ No DT binding changes
dtb-check ⏭️ No DTS changes
sparse-check No sparse warnings
check-uapi-headers No UAPI changes
check-patch-compliance 5 commits have content mismatch with upstream
tag-check All commits have FROMLIST: prefix

Detailed report: Full report

Checker analysis — click to expand

🤖 CI Checker Analysis (checker-log-analyzer)

PR: #644 - PCI: dwc/qcom: D3cold support series
Source: https://github.com/qualcomm-linux/kernel-config/actions/runs/26808637299
Target branch: qcom-6.18.y

Checker Result Summary
checkpatch All 10 commits pass
dt-binding-check ⏭️ No DT binding changes
dtb-check ⏭️ No DTS changes
sparse-check No sparse warnings
check-uapi-headers No UAPI changes
check-patch-compliance 5 commits have content mismatch with upstream
tag-check All commits have FROMLIST: prefix

❌ check-patch-compliance

Root cause: Five commits (6-10) report "Change is different from the one mentioned in Link" — the PR patches differ from the upstream lore patches they reference.

Failure details:

Checking commit: FROMLIST: PCI: host-common: Add helper to determine host bridge D3cold eligibility
Change is different from the one mentioned in Link

Checking commit: FROMLIST: PCI: qcom: Add .get_ltssm() helper
Change is different from the one mentioned in Link

Checking commit: FROMLIST: PCI: qcom: Power down PHY via PARF_PHY_CTRL before disabling rails/clocks
Change is different from the one mentioned in Link

Checking commit: FROMLIST: PCI: dwc: Use common D3cold eligibility helper in suspend path
Change is different from the one mentioned in Link

Checking commit: FROMLIST: PCI: qcom: Add D3cold support
Change is different from the one mentioned in Link

Affected commits and links:

  • Commit 6: 506269a475a2https://lore.kernel.org/all/20260429-d3cold-v5-1-89e9735b9df6@oss.qualcomm.com/
  • Commit 7: 8f9d61f44c54https://lore.kernel.org/all/20260429-d3cold-v5-2-89e9735b9df6@oss.qualcomm.com/
  • Commit 8: a1935d1856dfhttps://lore.kernel.org/all/20260429-d3cold-v5-3-89e9735b9df6@oss.qualcomm.com/
  • Commit 9: 0a3d45f4e42chttps://lore.kernel.org/all/20260429-d3cold-v5-4-89e9735b9df6@oss.qualcomm.com/
  • Commit 10: 2eb8abb271dbhttps://lore.kernel.org/all/20260429-d3cold-v5-5-89e9735b9df6@oss.qualcomm.com/

Analysis:

The content mismatch could be due to:

  1. Context-only differences — line numbers shifted due to vendor tree differences (not a real issue)
  2. Legitimate adaptations — vendor-specific modifications needed for qcom-6.18.y tree
  3. Missing/extra hunks — incomplete backport or unintended changes

Fix:

Verify each failing commit manually:

# For each commit, fetch the upstream patch and compare
b4 am --single-message -C -l -3 https://lore.kernel.org/all/20260429-d3cold-v5-1-89e9735b9df6@oss.qualcomm.com/ -o /tmp/upstream-1
git format-patch -1 506269a475a2 --stdout > /tmp/pr-1.patch

# Compare only the +/- lines (ignore context shifts)
diff <(grep -E '^[+-][^+-]' /tmp/pr-1.patch | grep -v '^---' | grep -v '^+++') \
     <(grep -E '^[+-][^+-]' /tmp/upstream-1/*.mbx | grep -v '^---' | grep -v '^+++')

Repeat for commits 7-10 with their respective lore links.

Possible outcomes:

  • If differences are context-only (line number shifts) → acceptable, no fix needed
  • If differences are legitimate adaptations → document them in commit message with explanation
  • If differences are missing hunks → add the missing changes
  • If differences are extra hunks → remove or attribute separately

Reproduce locally:

cd /path/to/kernel
git checkout qcom-6.18.y
git fetch origin pull/644/head:pr-644
git checkout pr-644

# Run check-patch-compliance
../kernel-checkers/check-patch-compliance.sh \
  --kernel-src . \
  --base b7cce9a3884a \
  --head 2eb8abb271db

Verdict

5 blockers to investigate — The content mismatch failures need manual verification to determine if they are:

  • False positives (context-only differences) → merge as-is
  • Legitimate adaptations → document in commit messages
  • Real issues (missing/extra hunks) → fix before merge

The first 5 commits (pci-dwc-suspend-rework-v4 series) passed check-patch-compliance cleanly, suggesting they match upstream exactly. The last 5 commits (d3cold-v5 series) need review to understand why they differ from the April 29 upstream patches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants