The knowledge server for AI agents — index your docs, code, Notion pages, Slack threads, and Discord forums into searchable, agent-accessible knowledge via MCP. One config file, one command, works with any AI coding agent.
npx @copilotkit/pathfinder init
npx @copilotkit/pathfinder serveOr with Docker:
docker pull ghcr.io/copilotkit/pathfinder
docker run -v ./pathfinder.yaml:/app/pathfinder.yaml \
-v ./docs:/app/docs -p 3001:3001 \
ghcr.io/copilotkit/pathfinderThen connect your AI agent:
{
"mcpServers": {
"my-docs": { "url": "http://localhost:3001/mcp" }
}
}This documentation is indexed by a live Pathfinder instance. Connect your agent to try it:
# Claude Code
claude mcp add pathfinder-docs --transport http https://mcp.pathfinder.copilotkit.dev/mcp// Claude Desktop / Cursor / any MCP client
{
"mcpServers": {
"pathfinder-docs": {
"url": "https://mcp.pathfinder.copilotkit.dev/mcp"
}
}
}Pathfinder indexes your GitHub repos — docs (Markdown, MDX, HTML) and source code — into a PostgreSQL vector database. Supports OpenAI, Ollama, and local transformers.js embeddings — no API key required for local providers. It serves configurable search and filesystem exploration tools via MCP, so AI agents can search your docs semantically and browse files with bash commands.
| Tool Type | What It Does | Example |
|---|---|---|
| Search | Semantic search over indexed content | search-docs("how to authenticate") |
| Bash | Virtual filesystem with find, grep, cat, ls | explore-docs("cat /docs/quickstart.mdx") |
| Collect | Structured data collection from agents | submit-feedback(rating: "helpful") |
| Knowledge | Browse/search FAQ pairs from conversational sources | knowledge-base("how to deploy") |
- Semantic Search — pgvector RAG with configurable chunk sizes, overlap, and score thresholds
- Filesystem Exploration — QuickJS WASM sandbox with session state,
qmdsemantic grep,relatedfiles - 8 Source Types — Markdown, code, raw-text, HTML, document (PDF/DOCX), Slack, Discord, Notion — with pluggable chunker registry
- Hybrid Search — Combine vector and keyword search with RRF scoring for better recall on technical terms
- Multiple Embedding Providers — OpenAI, Ollama (local HTTP), or transformers.js (zero external deps, CPU-only)
- Config-Driven — Everything in one
pathfinder.yaml: sources, tools, embedding, indexing, webhooks - Client Setup — Claude Desktop, Claude Code, Cursor, Codex, VS Code, any Streamable HTTP client
- Docker + Railway — Container image, docker-compose, Railway one-click
- Conversational Sources — Slack threads and Discord forums distilled into searchable Q&A pairs
- Auto-Generated Endpoints —
/llms.txt,/llms-full.txt,/faq.txt,/.well-known/skills/default/skill.md - Webhook Reindexing — GitHub push triggers incremental reindex
- Session Management — Global session cap, per-IP rate limiting, two-tier TTL (active vs unused sessions)
- Analytics — Query logging, top queries, empty results, latency metrics at
/analytics
# Scaffold config
npx @copilotkit/pathfinder init
# Auto-generate config from an existing docs site
npx @copilotkit/pathfinder init --from <url>
# Start server (uses PGlite if no DATABASE_URL)
npx @copilotkit/pathfinder serve
# Validate config, env vars, and source connectivity
npx @copilotkit/pathfinder validate
# Docker with Postgres
docker compose upStep-by-step migration guide: Migrate from Mintlify
Self-hosted Pathfinder sends nothing externally. No phone-home, no analytics. The telemetry code path is gated on a CopilotKit-internal env var that isn't set in any image you pull or any package you install — running your own copy is opt-out by default, with no flag to flip.
The hosted instance at mcp.pathfinder.copilotkit.dev records one event per MCP client connection — fired when a fresh client (claude.ai, Cursor, a custom MCP client) opens a session against the hosted server. The event contains:
- the client's IP address
- the User-Agent string
- the MCP transport in use (
sseorstreamable_http) - the first 8 characters of the session ID (for log correlation)
- whether the client presented an OAuth bearer token
What it does not contain: search queries, knowledge tool inputs, response content, full session IDs, or JWT subjects — anything inside an MCP request stays on the hosted server. There is no per-tool-call event of any kind.
The event is sent to a CopilotKit-controlled endpoint and forwarded to a small set of third-party analytics providers, which use the IP for company-level attribution — i.e., figuring out which organizations are evaluating the hosted instance. There's no individual-user identification; the IP is the only personal data point in the payload, and it's processed by those providers per their published terms.
If you'd rather not have any of this happen, run your own copy — Quick Start above gets you there in two commands. Same code, same features, no telemetry env vars set.
https://pathfinder.copilotkit.dev
Pathfinder is source-available under the Elastic License 2.0 (ELv2) with an Additional Use Grant.
You can: use it, modify it, self-host it, host it for your project's docs, run it for your company, contribute to it — all free.
One restriction: you can't sell Pathfinder as a standalone product or service. That's it.
See LICENSING.md for plain-English details.