Skip to content

FROMLIST: misc: fastrpc: fix context leak and hang on signal-interrupted invoke#637

Open
quic-anane wants to merge 1 commit into
qualcomm-linux:qcom-6.18.yfrom
quic-anane:intr_ctx
Open

FROMLIST: misc: fastrpc: fix context leak and hang on signal-interrupted invoke#637
quic-anane wants to merge 1 commit into
qualcomm-linux:qcom-6.18.yfrom
quic-anane:intr_ctx

Conversation

@quic-anane
Copy link
Copy Markdown

fastrpc invokes work by sending an RPC message to the DSP and blocking in wait_for_completion_interruptible() until the DSP responds. If a signal arrives during this wait, the syscall returns -ERESTARTSYS and the invoke context which holds the in-flight DMA buffers and completion state is left stranded in fl->pending.

On the next syscall attempt (either auto-restarted by the kernel via SA_RESTART or manually retried by user-space after EINTR), a fresh context is allocated and the RPC message is re-sent to the DSP. This has two consequences:

  • The original context leaks in fl->pending until the file is closed.
  • The DSP receives a duplicate invocation. If the DSP was mid-way through processing the first request and had issued a reverse RPC call back to the host, the retry sends a new forward request instead of the expected reverse-RPC response. The DSP thread waiting for that response is never woken, causing a hang.

Fix this by saving the interrupted context to a new fl->interrupted list on -ERESTARTSYS. When the same thread retries the invoke with a matching sc, restore the context and jump directly to the wait, skipping context allocation and message re-send.

Also drain fl->interrupted on process exit and complete any sleeping contexts with -EPIPE when the rpmsg channel is removed.

Link: https://lore.kernel.org/all/43a7laqb7mnrvleunnmbxwhvzr6w3au4ofjri4r4ap7clsx6mc@jxqlr4a2lw56/
Fixes: 387f625 ("misc: fastrpc: handle interrupted contexts")
Cc: stable@kernel.org

CRs-Fixed: 4411765

…ted invoke

fastrpc invokes work by sending an RPC message to the DSP and blocking
in wait_for_completion_interruptible() until the DSP responds. If a
signal arrives during this wait, the syscall returns -ERESTARTSYS and
the invoke context which holds the in-flight DMA buffers and
completion state is left stranded in fl->pending.

On the next syscall attempt (either auto-restarted by the kernel via
SA_RESTART or manually retried by user-space after EINTR), a fresh
context is allocated and the RPC message is re-sent to the DSP. This
has two consequences:

  - The original context leaks in fl->pending until the file is closed.
  - The DSP receives a duplicate invocation. If the DSP was mid-way
    through processing the first request and had issued a reverse RPC
    call back to the host, the retry sends a new forward request
    instead of the expected reverse-RPC response. The DSP thread
    waiting for that response is never woken, causing a hang.

Fix this by saving the interrupted context to a new fl->interrupted
list on -ERESTARTSYS. When the same thread retries the invoke with a
matching sc, restore the context and jump directly to the wait,
skipping context allocation and message re-send.

Also drain fl->interrupted on process exit and complete any sleeping
contexts with -EPIPE when the rpmsg channel is removed.

Link: https://lore.kernel.org/all/43a7laqb7mnrvleunnmbxwhvzr6w3au4ofjri4r4ap7clsx6mc@jxqlr4a2lw56/
Fixes: 387f625 ("misc: fastrpc: handle interrupted contexts")
Cc: stable@kernel.org
Signed-off-by: Anandu Krishnan E <anandu.e@oss.qualcomm.com>
@qlijarvis
Copy link
Copy Markdown

PR #637 — validate-patch

PR: #637

Verdict Issues Detailed Report
⚠️ 2 Full report

Final Summary

  1. Lore link present: Yes - https://lore.kernel.org/all/43a7laqb7mnrvleunnmbxwhvzr6w3au4ofjri4r4ap7clsx6mc@jxqlr4a2lw56/
  2. Lore link matches PR commits: Cannot verify - network access restricted; lore.kernel.org unreachable
  3. Upstream patch status: Cannot verify - network access restricted; unable to fetch thread or check mainline/stable trees
  4. PR present in qcom-next: No - searched qcom-next by subject, keyword, and diff fingerprint; no match found (fingerprint: f06b3de420f168d3e45cd0b12ee9ba0f)
Verdict: ⚠️ — click to expand

🔍 Patch Validation

PR: #637 - FROMLIST: misc: fastrpc: fix context leak and hang on signal-interrupted invoke
Upstream commit: https://lore.kernel.org/all/43a7laqb7mnrvleunnmbxwhvzr6w3au4ofjri4r4ap7clsx6mc@jxqlr4a2lw56/
Verdict: ⚠️ PARTIAL (network-restricted validation)

Commit Message

Check Status Note
Subject matches upstream ⏭️ Cannot verify - network access restricted
Body preserves rationale Comprehensive problem description and fix rationale present
Fixes tag present/correct Fixes: 387f625585d1 ("misc: fastrpc: handle interrupted contexts")
Authorship preserved FROMLIST: prefix - submitter in From: is correct; original author's Signed-off-by: present
Backport note (if applicable) N/A Not a backport - FROMLIST: indicates pending upstream patch

Diff

File Status Notes
drivers/misc/fastrpc.c ⏭️ Cannot compare to lore source - network restricted; patch structure appears well-formed with 13 hunks (+53/-16 lines)

Issues

  • Network restriction: Cannot fetch upstream lore patch to verify diff faithfulness due to network access restrictions. The lore link is present and well-formed.
  • FROMLIST authorship: Correctly follows FROMLIST: convention - submitter (Anandu Krishnan E) is in From: field and has added their Signed-off-by:. For full validation, verify that the original lore author's Signed-off-by: is present if the lore author differs from the submitter.

Verdict

Limited validation completed - The commit message structure is correct for a FROMLIST: patch with proper Fixes: tag, Cc: stable@kernel.org, lore link, and sign-off. The diff structure appears well-formed with logical changes to handle interrupted contexts (adds fl->interrupted list, save/restore functions, cleanup in exit paths). However, full validation requires network access to fetch the upstream lore patch and verify:

  1. Diff content matches the lore source exactly
  2. No unauthorized modifications were introduced
  3. Upstream patch acceptance status

Final Summary

  1. Lore link present: Yes - https://lore.kernel.org/all/43a7laqb7mnrvleunnmbxwhvzr6w3au4ofjri4r4ap7clsx6mc@jxqlr4a2lw56/
  2. Lore link matches PR commits: Cannot verify - network access restricted; lore.kernel.org unreachable
  3. Upstream patch status: Cannot verify - network access restricted; unable to fetch thread or check mainline/stable trees
  4. PR present in qcom-next: No - searched qcom-next by subject, keyword, and diff fingerprint; no match found (fingerprint: f06b3de420f168d3e45cd0b12ee9ba0f)

@qlijarvis
Copy link
Copy Markdown

PR #637 — checker-log-analyzer

PR: #637
Checker run: https://github.com/qualcomm-linux/kernel-config/actions/runs/26745496306

Checker Result Summary
Checker Result Summary
checkpatch 1 CHECK: function declaration line ending with (
dt-binding-check ⏭️ No DT binding changes
dtb-check ⏭️ No devicetree changes
sparse-check Passed
check-uapi-headers No UAPI changes
check-patch-compliance b4 fetch failed for lore link
tag-check Subject has valid FROMLIST: prefix

Detailed report: Full report

Checker analysis — click to expand

🤖 CI Checker Analysis (checker-log-analyzer)

PR: #637 - FROMLIST: misc: fastrpc: fix context leak and hang on signal-interrupted invoke
Source: https://github.com/qualcomm-linux/kernel-config/actions/runs/26745496306

Checker Result Summary
checkpatch 1 CHECK: function declaration line ending with (
dt-binding-check ⏭️ No DT binding changes
dtb-check ⏭️ No devicetree changes
sparse-check Passed
check-uapi-headers No UAPI changes
check-patch-compliance b4 fetch failed for lore link
tag-check Subject has valid FROMLIST: prefix

❌ checkpatch

Root cause: Function declaration line ends with opening parenthesis, violating kernel coding style.

Failure details:

CHECK: Lines should not end with a '('
#85: FILE: drivers/misc/fastrpc.c:606:
+static struct fastrpc_invoke_ctx *fastrpc_context_restore_interrupted(

9b459291a2a80e1452ae8d6879cae46099ac8931 total: 0 errors, 0 warnings, 1 checks, 153 lines checked

Fix: Move the opening parenthesis to the next line or fit the entire function signature on one line if possible. This is a CHECK (not ERROR or WARNING), so it's a style suggestion rather than a blocker.

Example fix:

static struct fastrpc_invoke_ctx *
fastrpc_context_restore_interrupted(struct fastrpc_user *fl, u64 sc)

Reproduce locally:

./scripts/checkpatch.pl --strict --ignore FILE_PATH_CHANGES --git a1f6d71a69b09d36f9af071759877e6d33b572fa..9b459291a2a80e1452ae8d6879cae46099ac8931

❌ check-patch-compliance

Root cause: b4 tool failed to fetch the upstream patch from the provided lore.kernel.org link.

Failure details:

Checking commit: FROMLIST: misc: fastrpc: fix context leak and hang on signal-interrupted invoke
Something seems wrong with the provided link. Please verify it
Try below command to run locally-
b4 am --single-message -C -l -3 https://lore.kernel.org/all/43a7laqb7mnrvleunnmbxwhvzr6w3au4ofjri4r4ap7clsx6mc@jxqlr4a2lw56/

Analysis: The lore link format appears unusual with a very long message-ID (43a7laqb7mnrvleunnmbxwhvzr6w3au4ofjri4r4ap7clsx6mc@jxqlr4a2lw56). This could be:

  1. A transient network issue in the CI environment
  2. An incorrectly formatted message-ID
  3. A message that hasn't been indexed by lore yet

Fix:

  1. Verify the lore link is accessible in a browser: https://lore.kernel.org/all/43a7laqb7mnrvleunnmbxwhvzr6w3au4ofjri4r4ap7clsx6mc@jxqlr4a2lw56/
  2. If the link is valid but b4 fails, this may be a transient CI issue — re-trigger the workflow
  3. If the message-ID is incorrect, update the Link: tag in the commit message with the correct lore URL
  4. Verify locally with: b4 am --single-message -C -l -3 <link>

Reproduce locally:

b4 am --single-message -C -l -3 https://lore.kernel.org/all/43a7laqb7mnrvleunnmbxwhvzr6w3au4ofjri4r4ap7clsx6mc@jxqlr4a2lw56/

Verdict

2 issues to address before merge:

  1. checkpatch (minor): Fix function declaration style — move opening ( to next line or fit signature on one line. This is a CHECK (style suggestion), not a hard blocker.

  2. check-patch-compliance (blocker): Verify the lore link is correct and accessible. If the link is valid, this may be a transient CI network issue — re-trigger the workflow. If the message-ID is wrong, update the commit message with the correct lore URL.

Recommendation: Fix the checkpatch style issue and verify/correct the lore link, then re-run CI. The actual code changes appear sound (no sparse, UAPI, or DT issues detected).

@qcomlnxci
Copy link
Copy Markdown

Test Matrix

Test Case lemans-evk monaco-evk qcs615-ride qcs6490-rb3gen2 qcs8300-ride qcs9100-ride-r3 x1e80100-crd
BT_FW_KMD_Service ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
BT_ON_OFF ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
BT_SCAN ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
CPUFreq_Validation ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
CPU_affinity ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
DSP_AudioPD ✅ Pass ✅ Pass ⚠️ skip ✅ Pass ✅ Pass ⚠️ skip ◻️
Ethernet ⚠️ skip ✅ Pass ⚠️ skip ⚠️ skip ⚠️ skip ⚠️ skip ◻️
Freq_Scaling ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
GIC ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
IPA ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
Interrupts ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
OpenCV ✅ Pass ⚠️ skip ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
PCIe ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
Probe_Failure_Check ❌ Fail ❌ Fail ❌ Fail ❌ Fail ❌ Fail ❌ Fail ◻️
RMNET ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
UFS_Validation ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
USBHost ❌ Fail ✅ Pass ❌ Fail ❌ Fail ❌ Fail ❌ Fail ◻️
WiFi_Firmware_Driver ❌ Fail ⚠️ skip ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
WiFi_OnOff ✅ Pass ❌ Fail ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
adsp_remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ❌ Fail ◻️
cdsp_remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ❌ Fail ◻️
gpdsp_remoteproc ✅ Pass ✅ Pass ⚠️ skip ⚠️ skip ✅ Pass ❌ Fail ◻️
hotplug ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
irq ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
kaslr ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
pinctrl ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
qcom_hwrng ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ❌ Fail ◻️
rngtest ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
shmbridge ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
smmu ❌ Fail ✅ Pass ❌ Fail ✅ Pass ✅ Pass ❌ Fail ◻️
watchdog ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️
wpss_remoteproc ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ✅ Pass ◻️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants