test(ci): stabilize random test failures by markijbema · Pull Request #11789 · Kilo-Org/kilocode

markijbema · 2026-06-29T10:08:08Z

What

Add ci-failing-test-runs-last-10-days.md with the scoped failed test job inventory from the last 10 days.
Stabilize JetBrains mention navigation tests by waiting for actual mention resolution state after validation callbacks fire.
Stabilize the JetBrains mock CLI server by waiting until its accept loop has started before returning a port to tests.
Let visual regression checkout/commit steps fall back to github.token when BOT_PAT is unavailable, which avoids Dependabot/internal PR checkout failures.

Why

Recent failing Actions runs showed recurring JetBrains async test flakes and visual-regression checkout failures when secrets.BOT_PAT is not populated. Most other failures in the inventory were branch-specific regressions, expected baseline updates, infrastructure flakes, or already fixed on main.

Validation

./gradlew :frontend:test --tests ai.kilocode.client.session.ui.prompt.MentionNavigatorTest
./gradlew :backend:test --tests ai.kilocode.backend.workspace.KiloBackendWorkspaceTest
bun run script/check-workflows.ts

Note: the first backend test attempt failed before tests ran with preload not found "@opentui/solid/preload"; running bun install fixed the local dependency setup, and the targeted backend test then passed.

…failures

kilo-code-bot · 2026-06-29T10:36:08Z

Code Review Summary

Status: 1 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	0
WARNING	1
SUGGESTION	0

Fix these issues in Kilo Cloud

Issue Details (click to expand)

WARNING

File	Line	Issue
`packages/kilo-jetbrains/backend/src/main/kotlin/ai/kilocode/backend/app/KiloBackendAppService.kt`	747	Clearing `loader` and `eventWatcher` before `cancelAndJoin()` completes can still let a replacement load or watcher start during the join window, so restart may still overlap stale work from the previous connection.

Files Reviewed (2 files)

packages/kilo-jetbrains/backend/src/main/kotlin/ai/kilocode/backend/app/KiloBackendAppService.kt - 1 issue
packages/kilo-jetbrains/backend/src/test/kotlin/ai/kilocode/backend/workspace/KiloBackendWorkspaceTest.kt - 0 issues

Previous Review Summaries (5 snapshots, latest commit 67d90be)

Current summary above is authoritative. Previous snapshots are kept for context only.

Previous review (commit `67d90be`)

Status: 1 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	0
WARNING	1
SUGGESTION	0

Fix these issues in Kilo Cloud

Issue Details (click to expand)

WARNING

File	Line	Issue
`packages/kilo-jetbrains/backend/src/main/kotlin/ai/kilocode/backend/app/KiloBackendAppService.kt`	747	Clearing `loader` and `eventWatcher` before `cancelAndJoin()` completes can let a replacement loader or watcher start during the join window, so restart can still overlap with stale work from the previous connection.

Files Reviewed (4 files)

.github/workflows/visual-regression.yml - 0 issues
packages/kilo-jetbrains/backend/src/main/kotlin/ai/kilocode/backend/app/KiloBackendAppService.kt - 1 issue
packages/kilo-jetbrains/backend/src/test/kotlin/ai/kilocode/backend/testing/MockCliServer.kt - 0 issues
packages/kilo-jetbrains/frontend/src/test/kotlin/ai/kilocode/client/session/ui/prompt/MentionNavigatorTest.kt - 0 issues

Previous review (commit `1796189`)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (1 files)

ci-failing-test-runs-last-10-days.md

Previous review (commit `48f5783`)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (1 files)

.github/workflows/visual-regression.yml

Previous review (commit `c311986`)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (4 files)

.github/workflows/visual-regression.yml
ci-failing-test-runs-last-10-days.md
packages/kilo-jetbrains/backend/src/test/kotlin/ai/kilocode/backend/testing/MockCliServer.kt
packages/kilo-jetbrains/frontend/src/test/kotlin/ai/kilocode/client/session/ui/prompt/MentionNavigatorTest.kt

Previous review (commit `41c6b5e`)

Status: 1 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	0
WARNING	1
SUGGESTION	0

Fix these issues in Kilo Cloud

Issue Details (click to expand)

WARNING

File	Line	Issue
`.github/workflows/visual-regression.yml`	179	`github.token` is read-only on Dependabot `pull_request` runs, so the auto-commit fallback still fails whenever baselines need to be pushed.

Files Reviewed (4 files)

.github/workflows/visual-regression.yml - 1 issue
ci-failing-test-runs-last-10-days.md - 0 issues
packages/kilo-jetbrains/backend/src/test/kotlin/ai/kilocode/backend/testing/MockCliServer.kt - 0 issues
packages/kilo-jetbrains/frontend/src/test/kotlin/ai/kilocode/client/session/ui/prompt/MentionNavigatorTest.kt - 0 issues

_{Reviewed by gpt-5.4-2026-03-05 · Input: 72.6K · Output: 9K · Cached: 355.8K}

_{Review guidance: REVIEW.md from base branch main}

markijbema · 2026-06-29T11:29:47Z

@@ -11,6 +11,7 @@ jobs:
    outputs:
      matched: ${{ steps.filter.outputs.matched }}


LLM-authored note: this gate separates PRs that can safely receive generated baseline commits from read-only PRs. It addresses the visual-regression Dependabot failures such as runs 27819074268 and 27841769389, where checkout failed with Input required and not supplied: token, and it also avoids the kilo-code-bot finding that github.token cannot push baseline commits on Dependabot pull_request runs. The fix is to expose one can_autocommit decision and have both visual jobs use it consistently.

markijbema · 2026-06-29T11:31:21Z

-      - name: Check for baseline changes (fork PRs)
-        if: needs.check-paths.outputs.is_fork == 'true'
+      # kilocode_change start
+      - name: Check for baseline changes (read-only PRs)


LLM-authored note: this read-only baseline check replaces the old fork-only branch for the kilo-ui visual job. It fixes visual-regression failures like run 27841769389 and later baseline runs where a PR could not receive generated screenshot commits. The correct behavior is to fail with an actionable message when baselines change but can_autocommit is false, because trying to push from a read-only token just fails later and hides the real visual-baseline update requirement.

markijbema · 2026-06-29T11:31:21Z

-      - name: Check for baseline changes (fork PRs)
-        if: needs.check-paths.outputs.is_fork == 'true'
+      # kilocode_change start
+      - name: Check for baseline changes (read-only PRs)


LLM-authored note: this mirrors the read-only baseline behavior for the VS Code webview visual job. It fixes visual-regression failures like run 27819074268 and the repeated Dependabot visual runs where baseline screenshots could be generated but could not be committed back. Checking can_autocommit here is the fix because the workflow must distinguish safe internal baseline-push runs from read-only PR contexts before deciding whether changed snapshots are a failure or an auto-commit.

markijbema · 2026-06-29T11:31:21Z

+        val ready = CountDownLatch(1)
+        executor.submit { acceptLoop(srv, ready) }
+        // LLM note: tests connect immediately after start(), so publish accept-loop readiness instead of racing CI scheduling.
+        check(ready.await(5, TimeUnit.SECONDS)) { "Mock CLI accept loop did not start" }


LLM-authored note: this fixes the JetBrains backend flake observed in the failing-test inventory on run 27970687997, where tests that connect immediately after MockCliServer.start() could race the executor starting the accept loop. Waiting on this latch makes start() mean both the port is allocated and the accept loop is scheduled, so callers no longer hit intermittent connection-refused/startup timing failures under CI load.

markijbema · 2026-06-29T11:31:22Z

-        assertTrue(provider.mentionAt(text, 6)?.resolved == true)
-        assertTrue(provider.mentionAt(text, 20)?.resolved == false)
+        // LLM note: the callback can fire before cross-thread mention state is observable to mouse-event assertions.
+        waitFor {


LLM-authored note: this fixes the JetBrains frontend mention-navigation flake from the failed JetBrains test runs in the inventory, including run 27970687997. The validation callback only proves the async validation path ran; it does not guarantee the resolved mention state is immediately visible to the mouse-event assertions. Waiting for the observable mentionAt state makes the test assert the real UI precondition instead of racing cross-thread state propagation.

markijbema · 2026-06-30T08:18:32Z

+            loader = null
+            eventWatcher = null
+            jobs
+        }.forEach { job ->


LLM-authored note: this fixes the new JetBrains failure from run 28428932405 (KiloBackendAppServiceTest.restart during Loading cancels stale load and reaches Ready). The run timed out in Loading(config=false, notifications=false, profile=LOADED) after restart because the previous app-load job was only cancelled, not joined, before opening the next CLI connection. Waiting for cancelAndJoin() here makes restart/reinstall serialize old load teardown before the new connection starts, so stale HTTP retries and SSE watchers cannot race the new Loading state.

kilo-code-bot · 2026-06-30T08:23:53Z

+    private suspend fun clear() {
+        synchronized(loadLock) {
+            val jobs = listOfNotNull(loader, eventWatcher)
+            loader = null


WARNING: Clearing the job slots before the old jobs finish cancelling can reintroduce the stale-load race

loader and eventWatcher are nulled before cancelAndJoin() finishes. That reopens the window for a concurrent load() or startWatchingGlobalSseEvents() call to install replacement jobs while this restart is still waiting on the old ones. reset() then runs without cancelling those replacement jobs, so the new restart can still overlap with stale work from the previous connection.

Reply with @kilocode-bot fix it to have Kilo Code address this issue.

markijbema · 2026-06-30T08:41:54Z

+    private suspend fun loaded(ws: KiloBackendWorkspace) {
+        withTimeout(15_000) {
+            ws.state.first { it is KiloWorkspaceState.Ready }
+        }


LLM-authored note: this fixes the JetBrains failure from run 28430385307 (KiloBackendWorkspaceTest.different directories return different workspaces). The assertion itself passed, but teardown timed out at line 52 because workspaces.get() starts background workspace loading and this identity test exited while those HTTP loads were still in flight. Waiting for the created workspace to reach Ready makes the test settle the implementation work it started before the scope-cancelling teardown runs.

markijbema added 5 commits June 29, 2026 12:06

chore(ci): document recent failed test runs

9dff34c

test(jetbrains): await mention resolution state

75a0e59

test(jetbrains): wait for mock CLI accept loop

c979c8b

fix(ci): fall back to github token for visual checkout

6c4256c

fix(ci): annotate visual token fallback

ab26315

markijbema marked this pull request as ready for review June 29, 2026 10:31

Merge remote-tracking branch 'origin/main' into mark/fix-random-test-…

41c6b5e

…failures

kilo-code-bot Bot reviewed Jun 29, 2026

View reviewed changes

Comment thread .github/workflows/visual-regression.yml Outdated

markijbema added 2 commits June 29, 2026 12:44

fix(ci): avoid visual baseline pushes without bot token

c311986

fix(ci): cover visual workflow markers

48f5783

markijbema commented Jun 29, 2026

View reviewed changes

markijbema added 3 commits June 29, 2026 13:31

chore(ci): remove failure inventory from pr

1796189

Merge branch 'main' into mark/fix-random-test-failures

00e3234

fix(jetbrains): join app load cancellation before restart

67d90be

markijbema commented Jun 30, 2026

View reviewed changes

kilo-code-bot Bot reviewed Jun 30, 2026

View reviewed changes

test(jetbrains): settle workspace loads before teardown

341526f

markijbema commented Jun 30, 2026

View reviewed changes

		@@ -11,6 +11,7 @@ jobs:
		outputs:
		matched: ${{ steps.filter.outputs.matched }}

Uh oh!

Conversation

markijbema commented Jun 29, 2026

What

Why

Validation

Uh oh!

Uh oh!

kilo-code-bot Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Overview

WARNING

Previous review (commit 67d90be)

Overview

WARNING

Previous review (commit 1796189)

Previous review (commit 48f5783)

Previous review (commit c311986)

Previous review (commit 41c6b5e)

Overview

WARNING

Uh oh!

markijbema Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

markijbema Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

markijbema Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

markijbema Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

markijbema Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

markijbema Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

markijbema Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kilo-code-bot Bot commented Jun 29, 2026 •

edited

Loading

Previous review (commit `67d90be`)

Previous review (commit `1796189`)

Previous review (commit `48f5783`)

Previous review (commit `c311986`)

Previous review (commit `41c6b5e`)