feat(skill): BM25-based smart skill retrieval for large catalogues#726
feat(skill): BM25-based smart skill retrieval for large catalogues#726claudianus wants to merge 16 commits into
Conversation
When >80 skills are installed, the system prompt switches from a full listing to a compact name-only format. The model discovers skills via the Skill tool's new action:"search" endpoint backed by a BM25 index. Three tiers by skill count: ≤ 80: legacy full listing (prompt-cache optimal) 81-300: compact name + description + search > 300: names only + search required Key changes: - skill/search.ts: BM25 index with synonym expansion (zero deps) - skill/registry.ts: auto-detect tier, lazy content loading - skill/parser.ts: parseSkillMetaFromFile (frontmatter-only) - skill-tool.ts: search action on existing Skill tool - system.md: search-first workflow instructions Performance (measured with 1,530 real skills): - System prompt: 118K → 8.4K tokens (93% reduction) - Startup memory: 88MB → 4MB (95% reduction) - Search latency: 0.0-0.2ms per query - Lazy content load: 0.4ms per skill activation Closes MoonshotAI#725
🦋 Changeset detectedLatest commit: f357f22 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4eca85320e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| export const SkillToolInputSchema: z.ZodType<SkillToolInput> = z.object({ | ||
| skill: z.string(), |
There was a problem hiding this comment.
Allow search calls without a skill name
When the model follows the new instructions/example and calls Skill as {"action":"search","query":"e2e test"}, preflight validation rejects it before execution() runs because this schema still makes skill required; loop/tool-call.ts validates the advertised JSON schema with AJV. This breaks the large-catalog search-first flow unless the model invents a dummy skill name, so skill should be optional for search or the schema should be discriminated by action.
Useful? React with 👍 / 👎.
| import { join } from 'node:path'; | ||
| import { performance } from 'node:perf_hooks'; | ||
|
|
||
| const SKILLS_DIR = join(homedir(), '.kimi', 'skills'); |
There was a problem hiding this comment.
Avoid depending on a local home skill catalog
These tests depend on ~/.kimi/skills existing and containing the author's real 1,530-skill catalog. In a clean CI or developer checkout this path is usually absent or has different contents, so the registry loads no/different skills and the listing assertions or results[0]! dereference fail. Build a temporary fixture catalog inside the test instead of reading the user's home directory.
Useful? React with 👍 / 👎.
| fencesFound++; | ||
| if (fencesFound === 2) break; | ||
| } | ||
| offset += line.length + 1; |
There was a problem hiding this comment.
Preserve CRLF frontmatter fences when slicing
For SKILL.md files written with CRLF line endings, split(/\r?\n/) removes a two-character newline but this offset only adds + 1, so frontmatterOnly is sliced before the second ---; parseSkillText then reports a missing closing fence and discovery skips the skill. This regresses Windows-authored skills that the full parser already accepts, so compute the fence position on the original buffer or account for \r\n lengths.
Useful? React with 👍 / 👎.
P1: Make skill field optional in SkillToolInputSchema — search-only
calls no longer need a dummy skill name. Validation added for
load action requiring skill parameter.
P1: Replace home-dir-dependent integration tests with self-contained
temp fixture (350 SKILL.md files in mkdtemp). Tests are now
portable across CI and developer machines.
P2: Fix CRLF fence offset in parseSkillMetaFromFile — split on \n
and strip trailing \r to correctly account for 2-byte newlines
when computing the slice boundary.
Also: update skill-tool.test.ts contract to reflect optional skill.
Six smoke tests with real fixture files (350 SKILL.md): 1. Tier auto-detection (names-only for 350 skills) 2. Lazy content loading (sentinel → readFileSync) 3. BM25 search accuracy (8/8 domain queries correct) 4. Schema validation (search without skill name) 5. CRLF frontmatter parsing 6. Full model flow: search → pick → load → render
… stopwords and listing cache
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a6c632b7a3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const result = parseSkillText({ ...options, text: frontmatterOnly }); | ||
| return { ...result, content: LAZY_CONTENT_SENTINEL }; |
There was a problem hiding this comment.
Preserve body-derived descriptions for flat skills
When a top-level flat .md skill has frontmatter but omits description, the existing parser supports it by deriving the description from the first body line. This lazy path now calls parseSkillText with only the frontmatter, so those valid skills get No description provided. and their body keywords are absent from the BM25 index/listing when loaded through SessionSkillRegistry.loadRoots, making them hard or impossible to discover by search.
Useful? React with 👍 / 👎.
|
|
||
| // Build the BM25 search index so the model can discover skills | ||
| // via the `skill_search` tool instead of scanning a full listing. | ||
| this.searchIndex.build(this.listInvocableSkills()); |
There was a problem hiding this comment.
Filter sub-skills out of the search index
The search index is built from listInvocableSkills(), but the model-facing listing below explicitly filters metadata.isSubSkill !== true. For user sub-skills that do not set disableModelInvocation, Skill search can now return hidden child skill names and prompt the model to load them directly, bypassing the parent skill that previously controlled how sub-skills were exposed; the lazy rebuild path uses the same unfiltered source as well.
Useful? React with 👍 / 👎.
|
Addressed the latest Codex review comments:
All affected tests pass and the package typechecks/builds successfully. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a6c632b7a3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| // Build the BM25 search index so the model can discover skills | ||
| // via the `skill_search` tool instead of scanning a full listing. | ||
| this.searchIndex.build(this.listInvocableSkills()); |
There was a problem hiding this comment.
Hide sub-skills from search results
When a skill root declares sub-skills, getModelSkillListing() intentionally filters metadata.isSubSkill so the model only sees the parent skill, but this new index is built from the unfiltered listInvocableSkills(). In large catalogs the model is now told to use search, so queries can surface and then load parent.child directly even though that sub-skill was deliberately absent from the model listing; build and rebuild the index with the same non-sub-skill filter used for the prompt listing.
Useful? React with 👍 / 👎.
| const frontmatterOnly = buffer.slice(0, offset + 3); | ||
| const result = parseSkillText({ ...options, text: frontmatterOnly }); |
There was a problem hiding this comment.
Preserve body-derived flat skill descriptions
For top-level flat .md skills that have frontmatter for fields like name/type but omit description (which is supported because only directory SKILL.md files require a description), this truncated parse leaves parseSkillText() with an empty body and records No description provided. instead of the previous first-body-line fallback. In catalogs that switch to search, those body terms are no longer indexed or listed, so these skills become hard to discover unless the exact name is known; read enough body to compute the fallback or fall back to the full parse when the frontmatter has no description.
Useful? React with 👍 / 👎.
|
Both points are addressed in commit
Thanks for the review! |
- Fix CRLF frontmatter fence slicing in parseSkillMetaFromFile - Derive flat-skill descriptions from body when frontmatter omits it - Exclude sub-skills from BM25 search index - Add regression tests for schema validation, CRLF, and sub-skill filtering Refs MoonshotAI#726
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f357f22cb4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const whenToUseTokens = tokenize(skill.metadata.whenToUse ?? '', { removeStopwords: true }); | ||
| const bodySnippetTokens = tokenize(skill.bodySnippet ?? '', { removeStopwords: true }); | ||
| const aliasesTokens = tokenize((skill.metadata.aliases ?? []).join(' '), { removeStopwords: true }); | ||
| const tagsTokens = tokenize((skill.metadata.tags ?? []).join(' '), { removeStopwords: true }); |
There was a problem hiding this comment.
Guard raw skill metadata before tokenizing
When a user skill contains raw YAML metadata such as tags: backend, aliases: docker, or a non-string whenToUse, these values are not normalized by the parser, but the new indexer calls .join()/string tokenization on them. A single such skill will throw while loadRoots() builds the BM25 index, aborting session skill loading and preventing built-in skills from being registered. Please coerce or ignore these fields with Array.isArray/typeof === 'string' before tokenizing.
Useful? React with 👍 / 👎.
|
|
||
| function tokenize(text: string, options: { removeStopwords?: boolean } = {}): string[] { | ||
| const raw = text | ||
| .toLowerCase() |
There was a problem hiding this comment.
Split camelCase before lowercasing tokens
Because tokenize() lowercases the text before calling splitCompoundIdentifier(), the camelCase/PascalCase regex in that helper can never match. In catalogs with skill names like codeReview or ReactHooks, searching for review or hooks will miss the name token unless the same word appears elsewhere, defeating the intended compound-name matching.
Useful? React with 👍 / 👎.
Related Issue
Resolve #725
Problem
When a user has a large number of skills installed (e.g. 1,000+), the current implementation injects the entire skill listing into the system prompt — including name, description (250 chars),
whenToUse, and file path for every skill.With 1,530 real skills this amounts to ~118,000 tokens in the system prompt, which:
What Changed
A 3-tier auto-detection system based on invocable skill count. No feature flag needed — the tier is selected automatically at session start.
New files
skill/search.ts— BM25 search index with synonym expansion. Zero external dependencies. ~24ms to build for 1,500 skills, <1ms per query.Modified files
skill/registry.ts— Auto-detect tier based on skill count.searchSkills()method with lazy index rebuild.renderSkillPrompt()lazy-loads SKILL.md content from disk only when a skill is activated. Listing thresholds are now configurable viaSkillRegistryOptions.skill/parser.ts—parseSkillMetaFromFile()reads only YAML frontmatter (~20 lines) instead of the full SKILL.md body. UsesLAZY_CONTENT_SENTINELto distinguish "not loaded" from "genuinely empty body". Also captures a small body snippet for search and derives better flat-skill descriptions from the first sentence.skill-tool.ts/.md—Skilltool extended withaction: "search"alongside existingaction: "load". Backward-compatible —skillfield remains required.profile/default/system.md— Two-step skill usage: search first, then load.session/index.ts— Minor cleanup (removed flag wiring).flags/registry.ts— No new flags needed.Tests
registry.test.ts(search, tiers, lazy rebuild, configurable thresholds)search.test.tswith dedicatedSkillSearchIndexunit tests (Unicode, synonyms, threshold, body snippet, aliases/tags)integration-proof.test.tsvalidates what the LLM actually sees with real 1,530 skillsscanner.test.tsupdated for lazy loading viarenderSkillPromptparser.test.tsfor body-snippet and first-sentence description fallback coverage@moonshot-ai/agent-coresuite: 2,622 pass / 3 unrelated failures / 1 todo (the 3 failures are flakyskill-tool-managertimeout under full-suite load and 2 unrelated MCP startup tests)Follow-up improvements added after review
\p{L}-based tokenization so non-English skills (Korean, Portuguese, etc.) are searchable.minScorethreshold —SkillSearchIndex.search()accepts an optional minimum score to filter out low-relevance matches.aliasesandtagsare now indexed with high/medium weight.compactListingThresholdandnamesOnlyListingThresholdcan be passed toSessionSkillRegistry.Performance (measured with 1,530 real skills from antigravity-awesome-skills)
Checklist
Follow-up fix
62d38ab8—fix(skill-search): handle string tags and aliases in skill search indexReal-world skill YAML frontmatter sometimes declares tags and aliases as comma or space-separated strings instead of arrays. Normalize them to string arrays before indexing so multi-keyword queries can match skills correctly.