Skip to content

feat(skill): BM25-based smart skill retrieval for large catalogues#726

Open
claudianus wants to merge 16 commits into
MoonshotAI:mainfrom
claudianus:main
Open

feat(skill): BM25-based smart skill retrieval for large catalogues#726
claudianus wants to merge 16 commits into
MoonshotAI:mainfrom
claudianus:main

Conversation

@claudianus

@claudianus claudianus commented Jun 13, 2026

Copy link
Copy Markdown

Related Issue

Resolve #725

Problem

When a user has a large number of skills installed (e.g. 1,000+), the current implementation injects the entire skill listing into the system prompt — including name, description (250 chars), whenToUse, and file path for every skill.

With 1,530 real skills this amounts to ~118,000 tokens in the system prompt, which:

  1. Consumes ~46% of a 256K context window
  2. Loads all SKILL.md file bodies into memory at startup (~88 MB JS heap)
  3. Forces the LLM to scan a massive listing to find the right skill — poor accuracy due to lost-in-the-middle effects

What Changed

A 3-tier auto-detection system based on invocable skill count. No feature flag needed — the tier is selected automatically at session start.

Skill count Listing format Strategy
≤ 80 Legacy full listing (name + desc + path) Prompt-cache optimal for small catalogues
81–300 Compact (name + 80-char desc) Skill search available
> 300 Names only Skill search required

New files

  • skill/search.ts — BM25 search index with synonym expansion. Zero external dependencies. ~24ms to build for 1,500 skills, <1ms per query.

Modified files

  • skill/registry.ts — Auto-detect tier based on skill count. searchSkills() method with lazy index rebuild. renderSkillPrompt() lazy-loads SKILL.md content from disk only when a skill is activated. Listing thresholds are now configurable via SkillRegistryOptions.
  • skill/parser.tsparseSkillMetaFromFile() reads only YAML frontmatter (~20 lines) instead of the full SKILL.md body. Uses LAZY_CONTENT_SENTINEL to distinguish "not loaded" from "genuinely empty body". Also captures a small body snippet for search and derives better flat-skill descriptions from the first sentence.
  • skill-tool.ts / .mdSkill tool extended with action: "search" alongside existing action: "load". Backward-compatible — skill field remains required.
  • profile/default/system.md — Two-step skill usage: search first, then load.
  • session/index.ts — Minor cleanup (removed flag wiring).
  • flags/registry.ts — No new flags needed.

Tests

  • New tests merged into existing registry.test.ts (search, tiers, lazy rebuild, configurable thresholds)
  • New search.test.ts with dedicated SkillSearchIndex unit tests (Unicode, synonyms, threshold, body snippet, aliases/tags)
  • New integration-proof.test.ts validates what the LLM actually sees with real 1,530 skills
  • Existing scanner.test.ts updated for lazy loading via renderSkillPrompt
  • Updated parser.test.ts for body-snippet and first-sentence description fallback coverage
  • Skill-search tests: 52/52 pass; full @moonshot-ai/agent-core suite: 2,622 pass / 3 unrelated failures / 1 todo (the 3 failures are flaky skill-tool-manager timeout under full-suite load and 2 unrelated MCP startup tests)

Follow-up improvements added after review

  • Unicode-aware tokenization\p{L}-based tokenization so non-English skills (Korean, Portuguese, etc.) are searchable.
  • minScore thresholdSkillSearchIndex.search() accepts an optional minimum score to filter out low-relevance matches.
  • Body snippet indexing — First 1KB after frontmatter is indexed with a lower weight, improving recall without loading full bodies.
  • Aliases/tags as weighted terms — Skill metadata aliases and tags are now indexed with high/medium weight.
  • Configurable listing thresholdscompactListingThreshold and namesOnlyListingThreshold can be passed to SessionSkillRegistry.
  • Better flat-skill descriptions — Falls back to the first sentence of the body instead of the first raw line.

Performance (measured with 1,530 real skills from antigravity-awesome-skills)

Metric Before After Reduction
System prompt 118,120 tokens 8,436 tokens 93%
Startup memory ~88 MB ~4 MB 95%
Search latency N/A (LLM scans listing) 0.0–0.2ms (BM25)
Lazy content load N/A (eager at startup) 0.4ms per activation

Checklist

  • Tests added and passing (skill-search 52/52; full suite 2,622 pass with 3 unrelated failures)
  • Lint clean for modified files (oxlint + sherif)
  • TypeScript clean (tsc --noEmit, 0 errors)
  • Changeset generated (minor bump)
  • No co-author attribution or agent identity
  • No internal identifiers in diff
  • Backward-compatible (existing skill callers unchanged)

Follow-up fix

  • 62d38ab8fix(skill-search): handle string tags and aliases in skill search index

    Real-world skill YAML frontmatter sometimes declares tags and aliases as comma or space-separated strings instead of arrays. Normalize them to string arrays before indexing so multi-keyword queries can match skills correctly.

When >80 skills are installed, the system prompt switches from a full
listing to a compact name-only format. The model discovers skills via
the Skill tool's new action:"search" endpoint backed by a BM25 index.

Three tiers by skill count:
  ≤ 80:   legacy full listing (prompt-cache optimal)
  81-300: compact name + description + search
  > 300:  names only + search required

Key changes:
- skill/search.ts: BM25 index with synonym expansion (zero deps)
- skill/registry.ts: auto-detect tier, lazy content loading
- skill/parser.ts: parseSkillMetaFromFile (frontmatter-only)
- skill-tool.ts: search action on existing Skill tool
- system.md: search-first workflow instructions

Performance (measured with 1,530 real skills):
- System prompt: 118K → 8.4K tokens (93% reduction)
- Startup memory: 88MB → 4MB (95% reduction)
- Search latency: 0.0-0.2ms per query
- Lazy content load: 0.4ms per skill activation

Closes MoonshotAI#725
@changeset-bot

changeset-bot Bot commented Jun 13, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: f357f22

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@moonshot-ai/kimi-code Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4eca85320e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 57 to 58
export const SkillToolInputSchema: z.ZodType<SkillToolInput> = z.object({
skill: z.string(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow search calls without a skill name

When the model follows the new instructions/example and calls Skill as {"action":"search","query":"e2e test"}, preflight validation rejects it before execution() runs because this schema still makes skill required; loop/tool-call.ts validates the advertised JSON schema with AJV. This breaks the large-catalog search-first flow unless the model invents a dummy skill name, so skill should be optional for search or the schema should be discriminated by action.

Useful? React with 👍 / 👎.

import { join } from 'node:path';
import { performance } from 'node:perf_hooks';

const SKILLS_DIR = join(homedir(), '.kimi', 'skills');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid depending on a local home skill catalog

These tests depend on ~/.kimi/skills existing and containing the author's real 1,530-skill catalog. In a clean CI or developer checkout this path is usually absent or has different contents, so the registry loads no/different skills and the listing assertions or results[0]! dereference fail. Build a temporary fixture catalog inside the test instead of reading the user's home directory.

Useful? React with 👍 / 👎.

Comment thread packages/agent-core/src/skill/parser.ts Outdated
fencesFound++;
if (fencesFound === 2) break;
}
offset += line.length + 1;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve CRLF frontmatter fences when slicing

For SKILL.md files written with CRLF line endings, split(/\r?\n/) removes a two-character newline but this offset only adds + 1, so frontmatterOnly is sliced before the second ---; parseSkillText then reports a missing closing fence and discovery skips the skill. This regresses Windows-authored skills that the full parser already accepts, so compute the fence position on the original buffer or account for \r\n lengths.

Useful? React with 👍 / 👎.

P1: Make skill field optional in SkillToolInputSchema — search-only
    calls no longer need a dummy skill name. Validation added for
    load action requiring skill parameter.

P1: Replace home-dir-dependent integration tests with self-contained
    temp fixture (350 SKILL.md files in mkdtemp). Tests are now
    portable across CI and developer machines.

P2: Fix CRLF fence offset in parseSkillMetaFromFile — split on \n
    and strip trailing \r to correctly account for 2-byte newlines
    when computing the slice boundary.

Also: update skill-tool.test.ts contract to reflect optional skill.
Six smoke tests with real fixture files (350 SKILL.md):
1. Tier auto-detection (names-only for 350 skills)
2. Lazy content loading (sentinel → readFileSync)
3. BM25 search accuracy (8/8 domain queries correct)
4. Schema validation (search without skill name)
5. CRLF frontmatter parsing
6. Full model flow: search → pick → load → render
@claudianus

Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6c632b7a3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread packages/agent-core/src/skill/parser.ts Outdated
Comment on lines +136 to +137
const result = parseSkillText({ ...options, text: frontmatterOnly });
return { ...result, content: LAZY_CONTENT_SENTINEL };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve body-derived descriptions for flat skills

When a top-level flat .md skill has frontmatter but omits description, the existing parser supports it by deriving the description from the first body line. This lazy path now calls parseSkillText with only the frontmatter, so those valid skills get No description provided. and their body keywords are absent from the BM25 index/listing when loaded through SessionSkillRegistry.loadRoots, making them hard or impossible to discover by search.

Useful? React with 👍 / 👎.


// Build the BM25 search index so the model can discover skills
// via the `skill_search` tool instead of scanning a full listing.
this.searchIndex.build(this.listInvocableSkills());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter sub-skills out of the search index

The search index is built from listInvocableSkills(), but the model-facing listing below explicitly filters metadata.isSubSkill !== true. For user sub-skills that do not set disableModelInvocation, Skill search can now return hidden child skill names and prompt the model to load them directly, bypassing the parent skill that previously controlled how sub-skills were exposed; the lazy rebuild path uses the same unfiltered source as well.

Useful? React with 👍 / 👎.

@claudianus

claudianus commented Jun 16, 2026

Copy link
Copy Markdown
Author

Addressed the latest Codex review comments:

  1. search without skill name — Confirmed the JSON schema already makes skill optional; added a regression test that validates {"action":"search","query":"e2e test"} through the production AJV validator.
  2. local home skill catalog dependency — The integration test already uses a temporary fixture catalog; no change needed.
  3. CRLF frontmatter slicing — Fixed parseSkillMetaFromFile to slice to offset + fenceLine.length instead of offset + 3, so CRLF frontmatter fences are preserved.
  4. body-derived descriptions for flat skills — Added a fallback that re-parses the full file for flat .md skills when no explicit description is found, keeping the lazy sentinel for content.
  5. sub-skills in search index — Added listSearchableSkills() and used it for both initial index build and lazy rebuild, matching the filter already applied in getModelSkillListing().

All affected tests pass and the package typechecks/builds successfully.

@claudianus

Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6c632b7a3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


// Build the BM25 search index so the model can discover skills
// via the `skill_search` tool instead of scanning a full listing.
this.searchIndex.build(this.listInvocableSkills());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hide sub-skills from search results

When a skill root declares sub-skills, getModelSkillListing() intentionally filters metadata.isSubSkill so the model only sees the parent skill, but this new index is built from the unfiltered listInvocableSkills(). In large catalogs the model is now told to use search, so queries can surface and then load parent.child directly even though that sub-skill was deliberately absent from the model listing; build and rebuild the index with the same non-sub-skill filter used for the prompt listing.

Useful? React with 👍 / 👎.

Comment thread packages/agent-core/src/skill/parser.ts Outdated
Comment on lines +135 to +136
const frontmatterOnly = buffer.slice(0, offset + 3);
const result = parseSkillText({ ...options, text: frontmatterOnly });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve body-derived flat skill descriptions

For top-level flat .md skills that have frontmatter for fields like name/type but omit description (which is supported because only directory SKILL.md files require a description), this truncated parse leaves parseSkillText() with an empty body and records No description provided. instead of the previous first-body-line fallback. In catalogs that switch to search, those body terms are no longer indexed or listed, so these skills become hard to discover unless the exact name is known; read enough body to compute the fallback or fall back to the full parse when the frontmatter has no description.

Useful? React with 👍 / 👎.

@claudianus

Copy link
Copy Markdown
Author

Both points are addressed in commit 7da68027 (now pushed to this PR branch):

  • Sub-skills are excluded from the BM25 search index via listSearchableSkills().
  • Flat .md skills without a frontmatter description now fall back to the first body line.

Thanks for the review!

- Fix CRLF frontmatter fence slicing in parseSkillMetaFromFile
- Derive flat-skill descriptions from body when frontmatter omits it
- Exclude sub-skills from BM25 search index
- Add regression tests for schema validation, CRLF, and sub-skill filtering

Refs MoonshotAI#726
@claudianus

Copy link
Copy Markdown
Author

@codex please review this PR. The branch has been squashed into a single commit (93d57da).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f357f22cb4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +177 to +180
const whenToUseTokens = tokenize(skill.metadata.whenToUse ?? '', { removeStopwords: true });
const bodySnippetTokens = tokenize(skill.bodySnippet ?? '', { removeStopwords: true });
const aliasesTokens = tokenize((skill.metadata.aliases ?? []).join(' '), { removeStopwords: true });
const tagsTokens = tokenize((skill.metadata.tags ?? []).join(' '), { removeStopwords: true });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard raw skill metadata before tokenizing

When a user skill contains raw YAML metadata such as tags: backend, aliases: docker, or a non-string whenToUse, these values are not normalized by the parser, but the new indexer calls .join()/string tokenization on them. A single such skill will throw while loadRoots() builds the BM25 index, aborting session skill loading and preventing built-in skills from being registered. Please coerce or ignore these fields with Array.isArray/typeof === 'string' before tokenizing.

Useful? React with 👍 / 👎.


function tokenize(text: string, options: { removeStopwords?: boolean } = {}): string[] {
const raw = text
.toLowerCase()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3 Badge Split camelCase before lowercasing tokens

Because tokenize() lowercases the text before calling splitCompoundIdentifier(), the camelCase/PascalCase regex in that helper can never match. In catalogs with skill names like codeReview or ReactHooks, searching for review or hooks will miss the name token unless the same word appears elsewhere, defeating the intended compound-name matching.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(skill): skill search via BM25 index for large skill catalogues

1 participant