Skip to content

PromptExecution/Proxy-Pointer-RAG

 
 

Repository files navigation

Proxy-Pointer Banner

Proxy-Pointer Suite -- Text, Multimodal RAG, and Cross-Document Comparison 🔍

Structural RAG for Complex Documents — A high-fidelity retrieval pipeline that uses document hierarchy as the primary retrieval anchor, eliminating "hallucination by chunking." Proxy-Pointer indexes structural pointers (breadcrumbs like Paper > Section > Sub-section) rather than raw text fragments, ensuring the LLM always understands exactly where it is in a document.

Retrieve precise text, get grounded visual citations, or perform Agentic section-by-section document comparisons.


Three Implementations, One Architecture

Feature Text-Only MultiModal DocComparator
Core Goal Maximum precision for text-based RAG Unified reasoning across text & visuals Agentic Cross-Document Comparison
Input Structured Markdown (LlamaParse) Markdown + Figures/Tables (Adobe Extract) PDF or MD (Mixed format supported)
Output Text-based answers Text +$\color{#15803d}{\textsf{\textbf{AI-Verified Visual Evidence}}}$ 🖼️ Side-by-side analytical reports
LLM Gemini 3.1 Flash-Lite Gemini 3.1 Flash-Lite Gemini 3 Flash
Embeddings gemini-embedding-001 (1536d) gemini-embedding-001 (1536d) gemini-embedding-001 (1536d)
Vision ✅ Gemini 3.1 Flash-Lite
Retrieval Structural re-ranking (k=5) Anchor-aware re-ranking + image selection Multi-Stage Proxy-Pointer retrieval
Benchmark 100% on FinanceBench 96% across 20-query, 5-paper suite N/A (Dynamic Agentic Evaluation)
Use Case 10-K Financials, Legal, Documentation Anything with Images, Diagrams, Charts Credit Agreements, Contracts, Research Papers
Interface CLI / Python API Streamlit UI with visual citations Streamlit UI with markdown export

How It Works

graph TD
    A[Documents] -->|PDF Extraction| B[Markdown]
    B -->|Tree Builder| C[Structure Trees]
    C -->|Noise Filter| D[Clean Nodes]
    D -->|Embed + Index| E[FAISS]
    E -->|"Query, Dedup, Re-Rank"| F[Top Sections]
    F -->|Synthesize + Cite| G[Grounded Answer]
Loading
  1. Structure trees map every section, sub-section, figure, and table in a document
  2. Noise filtering removes TOC, glossaries, and boilerplate using an LLM
  3. Broad vector recall (k=200) retrieves candidates, then LLM re-ranking selects the best structural matches
  4. Full section loading gives the synthesizer complete context — not truncated chunks
  5. (MultiModal only) Anchor-aware retrieval surfaces figures/tables physically linked to retrieved sections

Which One Should I Use?

Text-Only — Best when your documents are purely text-based and the hierarchy (e.g., Signatory > Item 1A > Risk Factors) is the only context needed. Proven at 100% accuracy on financial 10-K filings.

MultiModal — Best when your documents contain diagrams, charts, and tables that are essential to the answer. Uses anchor-aware retrieval to surface the exact images tied to a technical discussion, tested across 5 research papers (CLIP, GaLore, NemoBot, VectorFusion, VectorPainter).

DocComparator — Best when you need to perform deep, section-by-section comparisons between two complex documents. Uses Agentic RAG and targeted personas (like Senior Legal Counsel) to untangle legal trade-offs and methodological differences beyond surface-level keyword matching.


Architecture Deep Dive

For the full technical story behind the architecture:

  1. Proxy-Pointer Framework for Structure-Aware Enterprise Document Intelligence — Hierarchical understanding and comparison of contracts, research papers, and more
  2. Proxy-Pointer RAG: Multimodal Answers Without Multimodal Embeddings — Structure is all you need
  3. Proxy-Pointer RAG: Structure Meets Scale — 100% Accuracy with Smarter Retrieval — Scaling to multi-document, LLM re-ranking, and benchmark results
  4. Proxy-Pointer RAG: Achieving Vectorless Accuracy at Vector RAG Scale and Cost — Core architecture & the pointer-based retrieval idea

5-Minute Quickstart

Important Note for PyPI Users: While installing via PyPI (pip install pprag) gives you the CLI and code, the application relies on specific local folder structures (like data/) and environment variable templates. We strongly recommend cloning the repository first to get the necessary .env.example templates and sample data folders for each workflow.

1. Clone

git clone https://github.com/Proxy-Pointer/Proxy-Pointer-RAG.git
cd Proxy-Pointer-RAG

2. Create Virtual Environment & Install Dependencies

We strongly recommend creating a virtual environment first:

python -m venv venv
# Windows: venv\Scripts\activate | macOS/Linux: source venv/bin/activate

You can then install dependencies using standard pip or using uv (recommended for developers).

Option A: Standard pip

Install the package and your desired modality:

pip install pprag                 # minimal CLI shell
pip install "pprag[text]"         # text-only structural RAG
pip install "pprag[multimodal]"   # multimodal RAG with visual citations
pip install "pprag[compare]"      # cross-document comparison
pip install "pprag[full]"         # all modalities

Option B: For Developers (using uv)

If you want to tinker with the code, this project uses uv for lightning-fast dependency management.

pip install uv
uv sync --all-extras
# Remember to prefix commands with `uv run` if you use this method!

3. Configure API keys

Navigate into the folder for the modality you want to run (e.g., Text-Only, MultiModal, or DocComparator), copy the template, and add your API keys:

cd Text-Only
cp .env.example .env
# Edit .env → add your GOOGLE_API_KEY
# Note: Also review other commented variables, especially the FAISS trust settings required for local index loading!

4. Build the index

Build the FAISS index from scratch for your chosen modality:

# Prefix with `uv run` if you installed via Option B
pprag text index --fresh
# or `pprag multimodal index --fresh`

5. Start querying / Serve

Launch the CLI or Web UI:

# Prefix with `uv run` if you installed via Option B
pprag text ask
# or `pprag multimodal serve`
# or `pprag compare serve`

Each implementation also has its own self-contained README with a detailed quickstart:

All include sample data so you can clone, build the index, and start exploring immediately.


Author

Partha Sarkar

Contact

  • GitHub Issues: For bug reports
  • General Questions: Reach out on LinkedIn or Email

License

© 2026 Partha Sarkar (Proxy-Pointer). Licensed under MIT.

About

Proxy Pointer RAG: Structure aware reasoning at scale

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%