test(ci): stabilize random test failures#11789
Conversation
Code Review SummaryStatus: 1 Issues Found | Recommendation: Address before merge Overview
Fix these issues in Kilo Cloud Issue Details (click to expand)WARNING
Files Reviewed (2 files)
Previous Review Summaries (5 snapshots, latest commit 67d90be)Current summary above is authoritative. Previous snapshots are kept for context only. Previous review (commit 67d90be)Status: 1 Issues Found | Recommendation: Address before merge Overview
Fix these issues in Kilo Cloud Issue Details (click to expand)WARNING
Files Reviewed (4 files)
Previous review (commit 1796189)Status: No Issues Found | Recommendation: Merge Files Reviewed (1 files)
Previous review (commit 48f5783)Status: No Issues Found | Recommendation: Merge Files Reviewed (1 files)
Previous review (commit c311986)Status: No Issues Found | Recommendation: Merge Files Reviewed (4 files)
Previous review (commit 41c6b5e)Status: 1 Issues Found | Recommendation: Address before merge Overview
Fix these issues in Kilo Cloud Issue Details (click to expand)WARNING
Files Reviewed (4 files)
Reviewed by gpt-5.4-2026-03-05 · Input: 72.6K · Output: 9K · Cached: 355.8K Review guidance: REVIEW.md from base branch |
| @@ -11,6 +11,7 @@ jobs: | |||
| outputs: | |||
| matched: ${{ steps.filter.outputs.matched }} | |||
There was a problem hiding this comment.
LLM-authored note: this gate separates PRs that can safely receive generated baseline commits from read-only PRs. It addresses the visual-regression Dependabot failures such as runs 27819074268 and 27841769389, where checkout failed with Input required and not supplied: token, and it also avoids the kilo-code-bot finding that github.token cannot push baseline commits on Dependabot pull_request runs. The fix is to expose one can_autocommit decision and have both visual jobs use it consistently.
| - name: Check for baseline changes (fork PRs) | ||
| if: needs.check-paths.outputs.is_fork == 'true' | ||
| # kilocode_change start | ||
| - name: Check for baseline changes (read-only PRs) |
There was a problem hiding this comment.
LLM-authored note: this read-only baseline check replaces the old fork-only branch for the kilo-ui visual job. It fixes visual-regression failures like run 27841769389 and later baseline runs where a PR could not receive generated screenshot commits. The correct behavior is to fail with an actionable message when baselines change but can_autocommit is false, because trying to push from a read-only token just fails later and hides the real visual-baseline update requirement.
| - name: Check for baseline changes (fork PRs) | ||
| if: needs.check-paths.outputs.is_fork == 'true' | ||
| # kilocode_change start | ||
| - name: Check for baseline changes (read-only PRs) |
There was a problem hiding this comment.
LLM-authored note: this mirrors the read-only baseline behavior for the VS Code webview visual job. It fixes visual-regression failures like run 27819074268 and the repeated Dependabot visual runs where baseline screenshots could be generated but could not be committed back. Checking can_autocommit here is the fix because the workflow must distinguish safe internal baseline-push runs from read-only PR contexts before deciding whether changed snapshots are a failure or an auto-commit.
| val ready = CountDownLatch(1) | ||
| executor.submit { acceptLoop(srv, ready) } | ||
| // LLM note: tests connect immediately after start(), so publish accept-loop readiness instead of racing CI scheduling. | ||
| check(ready.await(5, TimeUnit.SECONDS)) { "Mock CLI accept loop did not start" } |
There was a problem hiding this comment.
LLM-authored note: this fixes the JetBrains backend flake observed in the failing-test inventory on run 27970687997, where tests that connect immediately after MockCliServer.start() could race the executor starting the accept loop. Waiting on this latch makes start() mean both the port is allocated and the accept loop is scheduled, so callers no longer hit intermittent connection-refused/startup timing failures under CI load.
| assertTrue(provider.mentionAt(text, 6)?.resolved == true) | ||
| assertTrue(provider.mentionAt(text, 20)?.resolved == false) | ||
| // LLM note: the callback can fire before cross-thread mention state is observable to mouse-event assertions. | ||
| waitFor { |
There was a problem hiding this comment.
LLM-authored note: this fixes the JetBrains frontend mention-navigation flake from the failed JetBrains test runs in the inventory, including run 27970687997. The validation callback only proves the async validation path ran; it does not guarantee the resolved mention state is immediately visible to the mouse-event assertions. Waiting for the observable mentionAt state makes the test assert the real UI precondition instead of racing cross-thread state propagation.
| loader = null | ||
| eventWatcher = null | ||
| jobs | ||
| }.forEach { job -> |
There was a problem hiding this comment.
LLM-authored note: this fixes the new JetBrains failure from run 28428932405 (KiloBackendAppServiceTest.restart during Loading cancels stale load and reaches Ready). The run timed out in Loading(config=false, notifications=false, profile=LOADED) after restart because the previous app-load job was only cancelled, not joined, before opening the next CLI connection. Waiting for cancelAndJoin() here makes restart/reinstall serialize old load teardown before the new connection starts, so stale HTTP retries and SSE watchers cannot race the new Loading state.
| private suspend fun clear() { | ||
| synchronized(loadLock) { | ||
| val jobs = listOfNotNull(loader, eventWatcher) | ||
| loader = null |
There was a problem hiding this comment.
WARNING: Clearing the job slots before the old jobs finish cancelling can reintroduce the stale-load race
loader and eventWatcher are nulled before cancelAndJoin() finishes. That reopens the window for a concurrent load() or startWatchingGlobalSseEvents() call to install replacement jobs while this restart is still waiting on the old ones. reset() then runs without cancelling those replacement jobs, so the new restart can still overlap with stale work from the previous connection.
Reply with @kilocode-bot fix it to have Kilo Code address this issue.
| private suspend fun loaded(ws: KiloBackendWorkspace) { | ||
| withTimeout(15_000) { | ||
| ws.state.first { it is KiloWorkspaceState.Ready } | ||
| } |
There was a problem hiding this comment.
LLM-authored note: this fixes the JetBrains failure from run 28430385307 (KiloBackendWorkspaceTest.different directories return different workspaces). The assertion itself passed, but teardown timed out at line 52 because workspaces.get() starts background workspace loading and this identity test exited while those HTTP loads were still in flight. Waiting for the created workspace to reach Ready makes the test settle the implementation work it started before the scope-cancelling teardown runs.
What
ci-failing-test-runs-last-10-days.mdwith the scoped failed test job inventory from the last 10 days.github.tokenwhenBOT_PATis unavailable, which avoids Dependabot/internal PR checkout failures.Why
Recent failing Actions runs showed recurring JetBrains async test flakes and visual-regression checkout failures when
secrets.BOT_PATis not populated. Most other failures in the inventory were branch-specific regressions, expected baseline updates, infrastructure flakes, or already fixed onmain.Validation
./gradlew :frontend:test --tests ai.kilocode.client.session.ui.prompt.MentionNavigatorTest./gradlew :backend:test --tests ai.kilocode.backend.workspace.KiloBackendWorkspaceTestbun run script/check-workflows.tsNote: the first backend test attempt failed before tests ran with
preload not found "@opentui/solid/preload"; runningbun installfixed the local dependency setup, and the targeted backend test then passed.