Skip to content

test(ci): stabilize random test failures#11789

Open
markijbema wants to merge 12 commits into
mainfrom
mark/fix-random-test-failures
Open

test(ci): stabilize random test failures#11789
markijbema wants to merge 12 commits into
mainfrom
mark/fix-random-test-failures

Conversation

@markijbema

Copy link
Copy Markdown
Contributor

What

  • Add ci-failing-test-runs-last-10-days.md with the scoped failed test job inventory from the last 10 days.
  • Stabilize JetBrains mention navigation tests by waiting for actual mention resolution state after validation callbacks fire.
  • Stabilize the JetBrains mock CLI server by waiting until its accept loop has started before returning a port to tests.
  • Let visual regression checkout/commit steps fall back to github.token when BOT_PAT is unavailable, which avoids Dependabot/internal PR checkout failures.

Why

Recent failing Actions runs showed recurring JetBrains async test flakes and visual-regression checkout failures when secrets.BOT_PAT is not populated. Most other failures in the inventory were branch-specific regressions, expected baseline updates, infrastructure flakes, or already fixed on main.

Validation

  • ./gradlew :frontend:test --tests ai.kilocode.client.session.ui.prompt.MentionNavigatorTest
  • ./gradlew :backend:test --tests ai.kilocode.backend.workspace.KiloBackendWorkspaceTest
  • bun run script/check-workflows.ts

Note: the first backend test attempt failed before tests ran with preload not found "@opentui/solid/preload"; running bun install fixed the local dependency setup, and the targeted backend test then passed.

@markijbema markijbema marked this pull request as ready for review June 29, 2026 10:31
Comment thread .github/workflows/visual-regression.yml Outdated
@kilo-code-bot

kilo-code-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Code Review Summary

Status: 1 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0

Fix these issues in Kilo Cloud

Issue Details (click to expand)

WARNING

File Line Issue
packages/kilo-jetbrains/backend/src/main/kotlin/ai/kilocode/backend/app/KiloBackendAppService.kt 747 Clearing loader and eventWatcher before cancelAndJoin() completes can still let a replacement load or watcher start during the join window, so restart may still overlap stale work from the previous connection.
Files Reviewed (2 files)
  • packages/kilo-jetbrains/backend/src/main/kotlin/ai/kilocode/backend/app/KiloBackendAppService.kt - 1 issue
  • packages/kilo-jetbrains/backend/src/test/kotlin/ai/kilocode/backend/workspace/KiloBackendWorkspaceTest.kt - 0 issues
Previous Review Summaries (5 snapshots, latest commit 67d90be)

Current summary above is authoritative. Previous snapshots are kept for context only.

Previous review (commit 67d90be)

Status: 1 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0

Fix these issues in Kilo Cloud

Issue Details (click to expand)

WARNING

File Line Issue
packages/kilo-jetbrains/backend/src/main/kotlin/ai/kilocode/backend/app/KiloBackendAppService.kt 747 Clearing loader and eventWatcher before cancelAndJoin() completes can let a replacement loader or watcher start during the join window, so restart can still overlap with stale work from the previous connection.
Files Reviewed (4 files)
  • .github/workflows/visual-regression.yml - 0 issues
  • packages/kilo-jetbrains/backend/src/main/kotlin/ai/kilocode/backend/app/KiloBackendAppService.kt - 1 issue
  • packages/kilo-jetbrains/backend/src/test/kotlin/ai/kilocode/backend/testing/MockCliServer.kt - 0 issues
  • packages/kilo-jetbrains/frontend/src/test/kotlin/ai/kilocode/client/session/ui/prompt/MentionNavigatorTest.kt - 0 issues

Previous review (commit 1796189)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (1 files)
  • ci-failing-test-runs-last-10-days.md

Previous review (commit 48f5783)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (1 files)
  • .github/workflows/visual-regression.yml

Previous review (commit c311986)

Status: No Issues Found | Recommendation: Merge

Files Reviewed (4 files)
  • .github/workflows/visual-regression.yml
  • ci-failing-test-runs-last-10-days.md
  • packages/kilo-jetbrains/backend/src/test/kotlin/ai/kilocode/backend/testing/MockCliServer.kt
  • packages/kilo-jetbrains/frontend/src/test/kotlin/ai/kilocode/client/session/ui/prompt/MentionNavigatorTest.kt

Previous review (commit 41c6b5e)

Status: 1 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0

Fix these issues in Kilo Cloud

Issue Details (click to expand)

WARNING

File Line Issue
.github/workflows/visual-regression.yml 179 github.token is read-only on Dependabot pull_request runs, so the auto-commit fallback still fails whenever baselines need to be pushed.
Files Reviewed (4 files)
  • .github/workflows/visual-regression.yml - 1 issue
  • ci-failing-test-runs-last-10-days.md - 0 issues
  • packages/kilo-jetbrains/backend/src/test/kotlin/ai/kilocode/backend/testing/MockCliServer.kt - 0 issues
  • packages/kilo-jetbrains/frontend/src/test/kotlin/ai/kilocode/client/session/ui/prompt/MentionNavigatorTest.kt - 0 issues

Reviewed by gpt-5.4-2026-03-05 · Input: 72.6K · Output: 9K · Cached: 355.8K

Review guidance: REVIEW.md from base branch main

@@ -11,6 +11,7 @@ jobs:
outputs:
matched: ${{ steps.filter.outputs.matched }}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM-authored note: this gate separates PRs that can safely receive generated baseline commits from read-only PRs. It addresses the visual-regression Dependabot failures such as runs 27819074268 and 27841769389, where checkout failed with Input required and not supplied: token, and it also avoids the kilo-code-bot finding that github.token cannot push baseline commits on Dependabot pull_request runs. The fix is to expose one can_autocommit decision and have both visual jobs use it consistently.

- name: Check for baseline changes (fork PRs)
if: needs.check-paths.outputs.is_fork == 'true'
# kilocode_change start
- name: Check for baseline changes (read-only PRs)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM-authored note: this read-only baseline check replaces the old fork-only branch for the kilo-ui visual job. It fixes visual-regression failures like run 27841769389 and later baseline runs where a PR could not receive generated screenshot commits. The correct behavior is to fail with an actionable message when baselines change but can_autocommit is false, because trying to push from a read-only token just fails later and hides the real visual-baseline update requirement.

- name: Check for baseline changes (fork PRs)
if: needs.check-paths.outputs.is_fork == 'true'
# kilocode_change start
- name: Check for baseline changes (read-only PRs)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM-authored note: this mirrors the read-only baseline behavior for the VS Code webview visual job. It fixes visual-regression failures like run 27819074268 and the repeated Dependabot visual runs where baseline screenshots could be generated but could not be committed back. Checking can_autocommit here is the fix because the workflow must distinguish safe internal baseline-push runs from read-only PR contexts before deciding whether changed snapshots are a failure or an auto-commit.

val ready = CountDownLatch(1)
executor.submit { acceptLoop(srv, ready) }
// LLM note: tests connect immediately after start(), so publish accept-loop readiness instead of racing CI scheduling.
check(ready.await(5, TimeUnit.SECONDS)) { "Mock CLI accept loop did not start" }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM-authored note: this fixes the JetBrains backend flake observed in the failing-test inventory on run 27970687997, where tests that connect immediately after MockCliServer.start() could race the executor starting the accept loop. Waiting on this latch makes start() mean both the port is allocated and the accept loop is scheduled, so callers no longer hit intermittent connection-refused/startup timing failures under CI load.

assertTrue(provider.mentionAt(text, 6)?.resolved == true)
assertTrue(provider.mentionAt(text, 20)?.resolved == false)
// LLM note: the callback can fire before cross-thread mention state is observable to mouse-event assertions.
waitFor {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM-authored note: this fixes the JetBrains frontend mention-navigation flake from the failed JetBrains test runs in the inventory, including run 27970687997. The validation callback only proves the async validation path ran; it does not guarantee the resolved mention state is immediately visible to the mouse-event assertions. Waiting for the observable mentionAt state makes the test assert the real UI precondition instead of racing cross-thread state propagation.

loader = null
eventWatcher = null
jobs
}.forEach { job ->

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM-authored note: this fixes the new JetBrains failure from run 28428932405 (KiloBackendAppServiceTest.restart during Loading cancels stale load and reaches Ready). The run timed out in Loading(config=false, notifications=false, profile=LOADED) after restart because the previous app-load job was only cancelled, not joined, before opening the next CLI connection. Waiting for cancelAndJoin() here makes restart/reinstall serialize old load teardown before the new connection starts, so stale HTTP retries and SSE watchers cannot race the new Loading state.

private suspend fun clear() {
synchronized(loadLock) {
val jobs = listOfNotNull(loader, eventWatcher)
loader = null

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Clearing the job slots before the old jobs finish cancelling can reintroduce the stale-load race

loader and eventWatcher are nulled before cancelAndJoin() finishes. That reopens the window for a concurrent load() or startWatchingGlobalSseEvents() call to install replacement jobs while this restart is still waiting on the old ones. reset() then runs without cancelling those replacement jobs, so the new restart can still overlap with stale work from the previous connection.


Reply with @kilocode-bot fix it to have Kilo Code address this issue.

private suspend fun loaded(ws: KiloBackendWorkspace) {
withTimeout(15_000) {
ws.state.first { it is KiloWorkspaceState.Ready }
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM-authored note: this fixes the JetBrains failure from run 28430385307 (KiloBackendWorkspaceTest.different directories return different workspaces). The assertion itself passed, but teardown timed out at line 52 because workspaces.get() starts background workspace loading and this identity test exited while those HTTP loads were still in flight. Waiting for the created workspace to reach Ready makes the test settle the implementation work it started before the scope-cancelling teardown runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant