Skip to content

BMDSoftware/Biomni-AD

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

558 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Biomni-AD Logo

Specialized fork of Biomni · Stanford SNAP Lab

Join Slack Web UI Follow on X Follow on LinkedIn Paper

Biomni-AD: AI Co-Scientist for Biomedical Research, integrated with Alzheimer's Disease Datalake

Overview

Biomni-AD is an Alzheimer's disease-specialized extension of Biomni (Stanford SNAP Lab), developed and maintained by Kuan-lin Huang, PhD at Kaimen Inc. It adds the AD1 agent — a domain-expert variant of the general A1 agent — along with an AD-focused data lake, curated dataset catalogs (NIAGADS, SinaiADRD, CRISPRbrain), and a plan-then-approve Chainlit UI optimized for neurodegeneration research workflows.

The underlying Biomni platform is a general-purpose biomedical AI agent that integrates LLM reasoning with retrieval-augmented planning and code-based execution to help scientists enhance research productivity and generate testable hypotheses.

Our commitment. Biomni-AD will remain fully open source, and we are working to deploy it on the Alzheimer's Disease Data Initiative (ADDI) platform so it can serve as many AD researchers as possible and accelerate progress against Alzheimer's disease and related dementias.

Branch Guide

Branch Purpose
feat/adworkbench Recommended install branch — Extends biomni-ad with AD Workbench dataset integration and containerization. Install with pip install git+https://github.com/Kaimen-Inc/Biomni-AD.git@feat/adworkbench.
biomni-ad Primary stable branch — Biomni-AD specialization without AD Workbench-specific deployment features.
main Upstream Stanford SNAP Biomni. Periodically merged into biomni-ad to track upstream. Read-only from this fork.

Documentation Index

Document Description
README.md This file — quick start, usage, and feature overview
ARCHITECTURE.md System design, agent framework, tool ecosystem, and data lake
CONTRIBUTION.md How to contribute tools, data, software, benchmarks, and know-how
DETAILS.md Technical reference: module roles, code organization, and entry points
chainlit.md.template Chainlit welcome-page template (rendered to chainlit.md on launch with the local data inventory; chainlit.md itself is gitignored)
biomni_env/README.md Environment installation instructions
docs/configuration.md Configuration management guide
docs/known_conflicts.md Known package conflicts and workarounds
docs/docker_vm_deployment.md Docker and VM deployment guide
docs/mcp_integration.md Model Context Protocol (MCP) server integration
docs/building_documentation.md Building Sphinx API documentation

Quick Start

Installation

Step 1 — Set up the environment

The Biomni environment includes 200+ scientific Python packages, R packages, and CLI bioinformatics tools. Follow biomni_env/README.md to run the setup script (choose the option that fits your needs).

Step 2 — Activate the environment

conda activate biomni_e1

Step 3 — Install the Biomni-AD package

Two install paths — pick the one that fits your needs.

Full conda env (recommended if you want all 22 bioinformatics tool modules and R support — what Step 2 set up):

pip install git+https://github.com/Kaimen-Inc/Biomni-AD.git@feat/adworkbench

Lightweight pip-only (agent core + LangChain stack, no conda required — good for notebooks, CI, or container images):

git clone https://github.com/Kaimen-Inc/Biomni-AD.git
cd Biomni-AD
pip install -e .                  # core: LangChain + OpenAI provider
pip install -e ".[anthropic]"     # add Claude (Anthropic) provider
pip install -e ".[all]"           # all provider + UI extras

Available extras: anthropic, bedrock, ollama, gradio, chainlit, all.

Or install the latest stable upstream release from PyPI (Biomni without the AD specialisation):

pip install biomni --upgrade

Step 4 — Configure your API keys

Choose one of the two methods below:

Click to expand API key setup options

Option 1: .env file (Recommended)

cp .env.example .env
# Then open .env and fill in your API keys

Your .env file should look like:

# Set at least ONE provider profile below (leave unused keys empty)

# Profile A: Anthropic direct
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# Optional custom Anthropic endpoint
# ANTHROPIC_BASE_URL=https://api.anthropic.com

# Profile B: OpenAI direct
OPENAI_API_KEY=your_openai_api_key_here
# Optional custom OpenAI-compatible endpoint
# OPENAI_BASE_URL=https://api.openai.com/v1

# Optional: Azure Anthropic (if using Claude via Azure AI Foundry)
ENDPOINT_URL=https://your-resource.services.ai.azure.com/anthropic/
DEPLOYMENT_NAME=your_claude_deployment_name
AZURE_ANTHROPIC_API_KEY=your_azure_anthropic_api_key

# Optional: Azure OpenAI (if using GPT via Azure OpenAI)
AZURE_OPENAI_API_KEY=your_azure_openai_api_key
OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com/

# Optional: AI Studio Gemini API Key (if using Gemini models)
GEMINI_API_KEY=your_gemini_api_key_here

# Optional: groq API Key (if using groq as model provider)
GROQ_API_KEY=your_groq_api_key_here

# Optional: Set the source of your LLM for example:
#"OpenAI", "AzureOpenAI", "Anthropic", "Ollama", "Gemini", "Bedrock", "Groq", "Custom"
LLM_SOURCE=your_LLM_source_here
# BIOMNI_SOURCE is also accepted for backward compatibility
# BIOMNI_SOURCE=your_LLM_source_here

# Optional: AWS Bedrock Configuration (if using AWS Bedrock models)
AWS_BEARER_TOKEN_BEDROCK=your_bedrock_api_key_here
AWS_REGION=us-east-1

# Optional: Custom model serving configuration
# CUSTOM_MODEL_BASE_URL=http://localhost:8000/v1
# CUSTOM_MODEL_API_KEY=your_custom_api_key_here

# Optional: Biomni data path (defaults to ./data)
# BIOMNI_DATA_PATH=/path/to/your/data

# Optional: Timeout settings (defaults to 600 seconds)
# BIOMNI_TIMEOUT_SECONDS=600

# Optional: Auto-switch to local-first mode when network/API calls fail (default: true)
# BIOMNI_AUTO_NETWORK_LIMITED_MODE=true

Option 2: Shell environment variables

Add to your ~/.bashrc (or ~/.zshrc):

# Required — at least one LLM provider key:
export ANTHROPIC_API_KEY="your_key"   # Claude models
export OPENAI_API_KEY="your_key"      # GPT models (optional)
export GEMINI_API_KEY="your_key"      # Gemini models (optional)
export GROQ_API_KEY="your_key"        # Groq models (optional)

# Optional custom endpoints for direct providers:
export ANTHROPIC_BASE_URL="https://api.anthropic.com"
export OPENAI_BASE_URL="https://api.openai.com/v1"

# Azure OpenAI (optional):
export OPENAI_ENDPOINT="https://your-resource.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your_key"

# Azure Anthropic (optional):
export ENDPOINT_URL="https://your-resource.services.ai.azure.com/anthropic/"
export DEPLOYMENT_NAME="your_claude_deployment_name"
export AZURE_ANTHROPIC_API_KEY="your_key"

# AWS Bedrock (optional):
export AWS_BEARER_TOKEN_BEDROCK="your_key"
export AWS_REGION="us-east-1"

⚠️ Known Package Conflicts

Some Python packages are not installed by default in the Biomni environment due to dependency conflicts. If you need these features, you must install the packages manually and may need to uncomment relevant code in the codebase. See the up-to-date list and details in docs/known_conflicts.md.

AD Data Lake

Biomni-AD ships three JSON catalogs — NIAGADS, SinaiADRD, and BiomniAD Discovery — that describe hundreds of AD/ADRD datasets. Files ≤ 100 MB are downloaded automatically to disk; larger or controlled-access files are referenced by catalog URI for on-demand access.

Catalog Contents Access
NIAGADS (NG*) Genetics, omics, biomarkers — ADSP WGS/WES, pQTL, eQTL, CSF sumstats Controlled + open
SinaiADRD Rare variants (RADR), single-nucleus eQTL (SingleBrain), microglia expression (isoMiGA) Open
BiomniAD Discovery SEA-AD, ssREAD, OASIS-4, HCP, ABC Atlas Open
CRISPRbrain CRISPR screens in neurons and microglia Open API

Downloaded files are cached in <data_lake>/biomniAD/<dataset_id>/ and skipped on re-runs. Set BIOMNI_DATA_PATH to control the storage root (defaults to ~/.biomni/data).

Option A — Automatic on AD1 init (default)

from biomni import AD1   # short top-level import; equivalent to `from biomni.agent.ad1 import AD1`

agent = AD1(download_ad_data=True)   # downloads files ≤ 100 MB on first run

Option B — Bulk download without starting an agent

from biomni.agent.ad_data_downloader import download_ad_catalog_data

download_ad_catalog_data("/path/to/your/data_lake")

Option C — Skip local download, use catalog URIs and internet

agent = AD1(download_ad_data=False)

The agent still references catalog URIs in its system prompt and can fetch data on demand or direct you to the relevant portal (e.g., NIAGADS DAC for controlled-access datasets).

Basic Usage & Agent Selection

Biomni-AD provides two agents:

  • AD1 (Alzheimer's Disease Agent): The primary agent for this fork — specialized for Alzheimer's and dementia research with AD-focused data sourcing, context-aware neurodegeneration instructions, and optimized tool selection.
  • A1 (General Agent): The upstream general-purpose biomedical agent. For general biomedical use without AD specialization, see the upstream Biomni project.

1. Chainlit Interactive UI — Default for Biomni-AD

The recommended way to run Biomni-AD is the Chainlit UI with its plan-then-approve workflow:

  1. The AD1 agent generates a numbered research plan before executing.
  2. You choose Approve & Execute, Revise Plan, or Cancel.
  3. The full ReAct loop runs with each step shown as a collapsible trace (Thinking → Code → Observation → Answer).

Setup (one-time):

conda activate biomni_e1
pip install "chainlit>=1.0"

Single instance:

bash run_chainlit.sh                   # opens http://localhost:8000
bash run_chainlit.sh --port 8080       # custom port
bash run_chainlit.sh --headless        # no browser auto-open (servers/CI)

Note: Always use bash run_chainlit.sh — not chainlit run chainlit_app.py directly. The script ensures the correct biomni_e1 Python is used even when another virtual environment (.venv) is active in the same shell.

Fleet deployment (multiple instances):

bash scripts/launch_biomniAD_fleet.sh  # launches and manages a fleet of Biomni-AD instances

Environment variables (optional):

Variable Default Description
BIOMNI_LLM claude-sonnet-4-5 LLM model used by both agents
BIOMNI_PATH ./data Data directory for the agent

2. Running in Notebooks or CLI

AD1 (Alzheimer's Specialized):

from biomni.agent.ad1 import AD1

agent = AD1(llm='claude-sonnet-4-5')
agent.go("Analyze Tau aggregation pathways and suggest potential inhibitors.")

A1 (General — upstream Biomni):

from biomni.agent import A1

agent = A1(llm='claude-sonnet-4-5')
agent.go("Plan a CRISPR screen to identify genes that regulate T cell exhaustion.")

3. Gradio UI

# AD1
from biomni.agent.ad1 import AD1
AD1().launch_ui()

# A1 (general)
from biomni.agent import A1
A1().launch_gradio_demo()

Install Gradio 5.x first: pip install "gradio>=5.0,<6.0"

UI options: share=True (public link) · server_name="127.0.0.1" (localhost only) · require_verification=True (access code, default "Biomni2025")

4. Docker Deployment (Local or VM)

Biomni can be run as a containerized service with external access:

cp .env.example .env
docker compose build
docker compose up -d

Then open http://localhost:8000 (or your VM public IP).

For full VM deployment instructions (firewall/security group, operations, and hardening), see docs/docker_vm_deployment.md.

Controlling Datalake Loading

By default, Biomni automatically downloads the datalake files (~11GB) when you create an agent. You can control this behavior:

# Skip automatic datalake download (faster initialization)
agent = A1(path='./data', llm='claude-sonnet-4-20250514', expected_data_lake_files = [])

This is useful for:

  • Faster testing and development
  • Environments with limited storage or bandwidth
  • Cases where you only need specific tools that don't require datalake files If you plan on using Azure for your model, always prefix the model name with azure- (e.g. llm='azure-gpt-4o').

Configuration Management

Biomni includes a centralized configuration system that provides flexible ways to manage settings. You can configure Biomni through environment variables, runtime modifications, or direct parameters.

from biomni.config import default_config
from biomni.agent import A1

# RECOMMENDED: Modify global defaults for consistency
default_config.llm = "gpt-4"
default_config.timeout_seconds = 1200

# All agents AND database queries use these defaults
agent = A1()  # Everything uses gpt-4, 1200s timeout

Note: Direct parameters to A1() only affect that agent's reasoning, not database queries. For consistent configuration across all operations, use default_config or environment variables.

For detailed configuration options, see the Configuration Guide.

PDF Generation

Generate PDF reports of execution traces:

from biomni.agent import A1

# Initialize agent
agent = A1(path='./data', llm='claude-sonnet-4-20250514')

# Run your task
agent.go("Your biomedical task here")

# Save conversation history as PDF
agent.save_conversation_history("my_analysis_results.pdf")

PDF Generation Dependencies:

Click to expand For optimal PDF generation, install one of these packages:
# Option 1: WeasyPrint (recommended for best layout control)
# Conda environment (recommended)
conda install weasyprint

# System installation
brew install weasyprint  # macOS
apt install weasyprint   # Linux

# See [WeasyPrint Installation Guide](https://doc.courtbouillon.org/weasyprint/stable/first_steps.html) for detailed instructions.

# Option 2: markdown2pdf (Rust-based, fast and reliable)
# macOS:
brew install theiskaa/tap/markdown2pdf

# Windows/Linux (using Cargo):
cargo install markdown2pdf

# Or download prebuilt binaries from:
# https://github.com/theiskaa/markdown2pdf/releases/latest

# Option 3: Pandoc (pip installation)
pip install pandoc

MCP (Model Context Protocol) Support

Biomni-AD supports MCP servers for external tool integration:

from biomni.agent.ad1 import AD1

agent = AD1()
agent.add_mcp(config_path="./mcp_config.yaml")
agent.go("Find FDA active ingredient information for donepezil")

For usage and implementation details, see the MCP Integration Documentation and examples in tutorials/examples/add_mcp_server/ and tutorials/examples/expose_biomni_server/.

Upstream Biomni Capabilities

Biomni-AD is a specialized fork of Biomni by Stanford's SNAP Lab. All upstream Biomni capabilities remain available — including 30+ biomedical tool domains, the Biomni-R0 reasoning model, the Biomni-Eval1 benchmark, the Know-How Library, and MCP integration. For features, models, and benchmarks not specific to Alzheimer's disease, refer to the upstream project directly:

  • Biomni-R0 reasoning model: biomni/Biomni-R0-32B-Preview
  • Biomni-Eval1 benchmark: biomni/Eval1
  • Know-How Library: curated lab protocols and best practices auto-retrieved by the agent (see biomni/know_how/)

For general-purpose biomedical AI agent use not focused on Alzheimer's disease, use the upstream Biomni project directly.

Tutorials

Biomni 101 — basic concepts and first steps (upstream Biomni).

AD-specific tutorials and example notebooks live alongside the AD1 agent in this repository.

Maintainership, Scope, and Contributions

Biomni-AD is maintained by Kuan-lin Huang, PhD at Kaimen Inc. (https://github.com/Kaimen-Inc/Biomni-AD.git) as a focused, AD-specific extension of upstream Biomni.

Open source and access commitments:

  • The Biomni-AD codebase, AD1 agent, data lake catalogs, and Chainlit UI will remain fully open source.
  • We are working to deploy Biomni-AD on the Alzheimer's Disease Data Initiative (ADDI) platform so AD researchers worldwide can use it to advance research without needing to self-host.

This repository is not a general open-science platform and is not soliciting community contributions, co-author tool submissions, or paper-credit programs. For those, please engage with the upstream Biomni project.

Bug reports and targeted pull requests against the AD-specific code paths (AD1 agent, AD data lake catalogs, Chainlit UI) are welcome via GitHub issues.

Important Notes

  • Security warning: Biomni-AD executes LLM-generated code with full system privileges. For production or shared use, run inside an isolated/sandboxed environment. The agent can access files, the network, and system commands — be careful with sensitive data or credentials.
  • Controlled-access data: NIAGADS and other catalog entries marked as controlled-access require independent authorization (e.g., NIAGADS DAC). Biomni-AD does not bypass access controls; the agent will reference catalog URIs and direct you to the appropriate portal.
  • Licensing: Biomni-AD inherits upstream Biomni's Apache 2.0 license, but certain integrated tools, databases, or software may carry more restrictive licenses. Review each component before any commercial use.

Citation

Biomni-AD builds on upstream Biomni. Please cite the original Biomni paper:

@article{huang2025biomni,
  title={Biomni: A General-Purpose Biomedical AI Agent},
  author={Huang, Kexin and Zhang, Serena and Wang, Hanchen and Qu, Yuanhao and Lu, Yingzhou and Roohani, Yusuf and Li, Ryan and Qiu, Lin and Zhang, Junze and Di, Yin and others},
  journal={bioRxiv},
  pages={2025--05},
  year={2025},
  publisher={Cold Spring Harbor Laboratory}
}

If you use Biomni-AD specifically (AD1 agent, AD data lake, or Chainlit workflow), please also credit this repository: Biomni-AD, Kuan-lin Huang, Kaimen Inc. — https://github.com/Kaimen-Inc/Biomni-AD

About

Biomni-AD: AI Co-Scientist for Biomedical Research (AD datalake integration, Open Source)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 83.3%
  • Jupyter Notebook 14.2%
  • Shell 2.0%
  • Other 0.5%