Structural RAG for Complex Documents — A high-fidelity retrieval pipeline that uses document hierarchy as the primary retrieval anchor, eliminating "hallucination by chunking." Proxy-Pointer indexes structural pointers (breadcrumbs like Paper > Section > Sub-section) rather than raw text fragments, ensuring the LLM always understands exactly where it is in a document.
Retrieve precise text, get grounded visual citations, or perform Agentic section-by-section document comparisons.
| Feature | Text-Only | MultiModal | DocComparator |
|---|---|---|---|
| Core Goal | Maximum precision for text-based RAG | Unified reasoning across text & visuals | Agentic Cross-Document Comparison |
| Input | Structured Markdown (LlamaParse) | Markdown + Figures/Tables (Adobe Extract) | PDF or MD (Mixed format supported) |
| Output | Text-based answers | Text +$\color{#15803d}{\textsf{\textbf{AI-Verified Visual Evidence}}}$ 🖼️ | Side-by-side analytical reports |
| LLM | Gemini 3.1 Flash-Lite | Gemini 3.1 Flash-Lite | Gemini 3 Flash |
| Embeddings | gemini-embedding-001 (1536d) | gemini-embedding-001 (1536d) | gemini-embedding-001 (1536d) |
| Vision | — | ✅ Gemini 3.1 Flash-Lite | — |
| Retrieval | Structural re-ranking (k=5) | Anchor-aware re-ranking + image selection | Multi-Stage Proxy-Pointer retrieval |
| Benchmark | 100% on FinanceBench | 96% across 20-query, 5-paper suite | N/A (Dynamic Agentic Evaluation) |
| Use Case | 10-K Financials, Legal, Documentation | Anything with Images, Diagrams, Charts | Credit Agreements, Contracts, Research Papers |
| Interface | CLI / Python API | Streamlit UI with visual citations | Streamlit UI with markdown export |
graph TD
A[Documents] -->|PDF Extraction| B[Markdown]
B -->|Tree Builder| C[Structure Trees]
C -->|Noise Filter| D[Clean Nodes]
D -->|Embed + Index| E[FAISS]
E -->|"Query, Dedup, Re-Rank"| F[Top Sections]
F -->|Synthesize + Cite| G[Grounded Answer]
- Structure trees map every section, sub-section, figure, and table in a document
- Noise filtering removes TOC, glossaries, and boilerplate using an LLM
- Broad vector recall (k=200) retrieves candidates, then LLM re-ranking selects the best structural matches
- Full section loading gives the synthesizer complete context — not truncated chunks
- (MultiModal only) Anchor-aware retrieval surfaces figures/tables physically linked to retrieved sections
Text-Only — Best when your documents are purely text-based and the hierarchy (e.g., Signatory > Item 1A > Risk Factors) is the only context needed. Proven at 100% accuracy on financial 10-K filings.
MultiModal — Best when your documents contain diagrams, charts, and tables that are essential to the answer. Uses anchor-aware retrieval to surface the exact images tied to a technical discussion, tested across 5 research papers (CLIP, GaLore, NemoBot, VectorFusion, VectorPainter).
DocComparator — Best when you need to perform deep, section-by-section comparisons between two complex documents. Uses Agentic RAG and targeted personas (like Senior Legal Counsel) to untangle legal trade-offs and methodological differences beyond surface-level keyword matching.
For the full technical story behind the architecture:
- Proxy-Pointer Framework for Structure-Aware Enterprise Document Intelligence — Hierarchical understanding and comparison of contracts, research papers, and more
- Proxy-Pointer RAG: Multimodal Answers Without Multimodal Embeddings — Structure is all you need
- Proxy-Pointer RAG: Structure Meets Scale — 100% Accuracy with Smarter Retrieval — Scaling to multi-document, LLM re-ranking, and benchmark results
- Proxy-Pointer RAG: Achieving Vectorless Accuracy at Vector RAG Scale and Cost — Core architecture & the pointer-based retrieval idea
Important Note for PyPI Users: While installing via PyPI (
pip install pprag) gives you the CLI and code, the application relies on specific local folder structures (likedata/) and environment variable templates. We strongly recommend cloning the repository first to get the necessary.env.exampletemplates and sample data folders for each workflow.
git clone https://github.com/Proxy-Pointer/Proxy-Pointer-RAG.git
cd Proxy-Pointer-RAGWe strongly recommend creating a virtual environment first:
python -m venv venv
# Windows: venv\Scripts\activate | macOS/Linux: source venv/bin/activateYou can then install dependencies using standard pip or using uv (recommended for developers).
Install the package and your desired modality:
pip install pprag # minimal CLI shell
pip install "pprag[text]" # text-only structural RAG
pip install "pprag[multimodal]" # multimodal RAG with visual citations
pip install "pprag[compare]" # cross-document comparison
pip install "pprag[full]" # all modalitiesIf you want to tinker with the code, this project uses uv for lightning-fast dependency management.
pip install uv
uv sync --all-extras
# Remember to prefix commands with `uv run` if you use this method!Navigate into the folder for the modality you want to run (e.g., Text-Only, MultiModal, or DocComparator), copy the template, and add your API keys:
cd Text-Only
cp .env.example .env
# Edit .env → add your GOOGLE_API_KEY
# Note: Also review other commented variables, especially the FAISS trust settings required for local index loading!Build the FAISS index from scratch for your chosen modality:
# Prefix with `uv run` if you installed via Option B
pprag text index --fresh
# or `pprag multimodal index --fresh`Launch the CLI or Web UI:
# Prefix with `uv run` if you installed via Option B
pprag text ask
# or `pprag multimodal serve`
# or `pprag compare serve`Each implementation also has its own self-contained README with a detailed quickstart:
All include sample data so you can clone, build the index, and start exploring immediately.
Partha Sarkar
© 2026 Partha Sarkar (Proxy-Pointer). Licensed under MIT.
