Proxy-Pointer Suite -- Text, Multimodal RAG, and Cross-Document Comparison 🔍

Structural RAG for Complex Documents — A high-fidelity retrieval pipeline that uses document hierarchy as the primary retrieval anchor, eliminating "hallucination by chunking." Proxy-Pointer indexes structural pointers (breadcrumbs like Paper > Section > Sub-section) rather than raw text fragments, ensuring the LLM always understands exactly where it is in a document.

Retrieve precise text, get grounded visual citations, or perform Agentic section-by-section document comparisons.

Three Implementations, One Architecture

Feature	Text-Only	MultiModal	DocComparator
Core Goal	Maximum precision for text-based RAG	Unified reasoning across text & visuals	Agentic Cross-Document Comparison
Input	Structured Markdown (LlamaParse)	Markdown + Figures/Tables (Adobe Extract)	PDF or MD (Mixed format supported)
Output	Text-based answers	Text +$\color{#15803d}{\textsf{\textbf{AI-Verified Visual Evidence}}}$ 🖼️	Side-by-side analytical reports
LLM	Gemini 3.1 Flash-Lite	Gemini 3.1 Flash-Lite	Gemini 3 Flash
Embeddings	gemini-embedding-001 (1536d)	gemini-embedding-001 (1536d)	gemini-embedding-001 (1536d)
Vision	—	✅ Gemini 3.1 Flash-Lite	—
Retrieval	Structural re-ranking (k=5)	Anchor-aware re-ranking + image selection	Multi-Stage Proxy-Pointer retrieval
Benchmark	100% on FinanceBench	96% across 20-query, 5-paper suite	N/A (Dynamic Agentic Evaluation)
Use Case	10-K Financials, Legal, Documentation	Anything with Images, Diagrams, Charts	Credit Agreements, Contracts, Research Papers
Interface	CLI / Python API	Streamlit UI with visual citations	Streamlit UI with markdown export

How It Works

graph TD
    A[Documents] -->|PDF Extraction| B[Markdown]
    B -->|Tree Builder| C[Structure Trees]
    C -->|Noise Filter| D[Clean Nodes]
    D -->|Embed + Index| E[FAISS]
    E -->|"Query, Dedup, Re-Rank"| F[Top Sections]
    F -->|Synthesize + Cite| G[Grounded Answer]

Structure trees map every section, sub-section, figure, and table in a document
Noise filtering removes TOC, glossaries, and boilerplate using an LLM
Broad vector recall (k=200) retrieves candidates, then LLM re-ranking selects the best structural matches
Full section loading gives the synthesizer complete context — not truncated chunks
(MultiModal only) Anchor-aware retrieval surfaces figures/tables physically linked to retrieved sections

Which One Should I Use?

Text-Only — Best when your documents are purely text-based and the hierarchy (e.g., Signatory > Item 1A > Risk Factors) is the only context needed. Proven at 100% accuracy on financial 10-K filings.

MultiModal — Best when your documents contain diagrams, charts, and tables that are essential to the answer. Uses anchor-aware retrieval to surface the exact images tied to a technical discussion, tested across 5 research papers (CLIP, GaLore, NemoBot, VectorFusion, VectorPainter).

DocComparator — Best when you need to perform deep, section-by-section comparisons between two complex documents. Uses Agentic RAG and targeted personas (like Senior Legal Counsel) to untangle legal trade-offs and methodological differences beyond surface-level keyword matching.

Architecture Deep Dive

For the full technical story behind the architecture:

Proxy-Pointer Framework for Structure-Aware Enterprise Document Intelligence — Hierarchical understanding and comparison of contracts, research papers, and more
Proxy-Pointer RAG: Multimodal Answers Without Multimodal Embeddings — Structure is all you need
Proxy-Pointer RAG: Structure Meets Scale — 100% Accuracy with Smarter Retrieval — Scaling to multi-document, LLM re-ranking, and benchmark results
Proxy-Pointer RAG: Achieving Vectorless Accuracy at Vector RAG Scale and Cost — Core architecture & the pointer-based retrieval idea

5-Minute Quickstart

Important Note for PyPI Users: While installing via PyPI (pip install pprag) gives you the CLI and code, the application relies on specific local folder structures (like data/) and environment variable templates. We strongly recommend cloning the repository first to get the necessary .env.example templates and sample data folders for each workflow.

1. Clone

git clone https://github.com/Proxy-Pointer/Proxy-Pointer-RAG.git
cd Proxy-Pointer-RAG

2. Create Virtual Environment & Install Dependencies

We strongly recommend creating a virtual environment first:

python -m venv venv
# Windows: venv\Scripts\activate | macOS/Linux: source venv/bin/activate

You can then install dependencies using standard pip or using uv (recommended for developers).

Option A: Standard pip

Install the package and your desired modality:

pip install pprag                 # minimal CLI shell
pip install "pprag[text]"         # text-only structural RAG
pip install "pprag[multimodal]"   # multimodal RAG with visual citations
pip install "pprag[compare]"      # cross-document comparison
pip install "pprag[full]"         # all modalities

Option B: For Developers (using uv)

If you want to tinker with the code, this project uses uv for lightning-fast dependency management.

pip install uv
uv sync --all-extras
# Remember to prefix commands with `uv run` if you use this method!

3. Configure API keys

Navigate into the folder for the modality you want to run (e.g., Text-Only, MultiModal, or DocComparator), copy the template, and add your API keys:

cd Text-Only
cp .env.example .env
# Edit .env → add your GOOGLE_API_KEY
# Note: Also review other commented variables, especially the FAISS trust settings required for local index loading!

4. Build the index

Build the FAISS index from scratch for your chosen modality:

# Prefix with `uv run` if you installed via Option B
pprag text index --fresh
# or `pprag multimodal index --fresh`

5. Start querying / Serve

Launch the CLI or Web UI:

# Prefix with `uv run` if you installed via Option B
pprag text ask
# or `pprag multimodal serve`
# or `pprag compare serve`

Each implementation also has its own self-contained README with a detailed quickstart:

All include sample data so you can clone, build the index, and start exploring immediately.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
DocComparator		DocComparator
MultiModal		MultiModal
Text-Only		Text-Only
assets		assets
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Proxy-Pointer Suite -- Text, Multimodal RAG, and Cross-Document Comparison 🔍

Three Implementations, One Architecture

How It Works

Which One Should I Use?

Architecture Deep Dive

5-Minute Quickstart

1. Clone

2. Create Virtual Environment & Install Dependencies

Option A: Standard pip

Option B: For Developers (using uv)

3. Configure API keys

4. Build the index

5. Start querying / Serve

Author

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Proxy-Pointer Suite -- Text, Multimodal RAG, and Cross-Document Comparison 🔍

Three Implementations, One Architecture

How It Works

Which One Should I Use?

Architecture Deep Dive

5-Minute Quickstart

1. Clone

2. Create Virtual Environment & Install Dependencies

Option A: Standard pip

Option B: For Developers (using uv)

3. Configure API keys

4. Build the index

5. Start querying / Serve

Author

Contact

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages