Unstructured docs to
answers
in one pipeline
Parse, extract, ingest, and query with six RAG strategies. Run docpipe serve for /health, /metrics, and optional OTEL — composable pipelines, no lock-in.
$ pip install docpipe-sdk[all]Quickstart
Parse, ingest, and query from the CLI — or run the API server with Docker.
# Parse a document
$ docpipe parse invoice.pdf --format markdown
# Ingest into your vector DB
$ docpipe ingest report.pdf \
--db "postgresql://..." \
--table docs \
--embedding-provider openai \
--embedding-model text-embedding-3-small \
--incremental
# Start API server (install [server] or [server,observability] for OTEL)
$ docpipe serve --port 8000
# Health & metrics (no auth on /metrics)
# curl http://localhost:8000/health
# curl http://localhost:8000/metricsCapabilities at a glance
Parse through evaluate — composable pipelines orbiting a single SDK. Hover or tap to pause; click any node to jump in.
Orbit paused · tap wheel to resume · tap a node to explore
Composable Pipelines
Seven workflows — four core stages plus extract-only, full chain, and observability. Use each independently or chain them together. Your data, your DB, your LLM.
Documents
PDF, DOCX, images...Parse
Docling · GLM-OCRExtract
LangExtract · LangChainIngest
pgvector · turbovec opt.RAG Query
6 strategies · streamObserve
OTEL · /health · metricsDocuments
PDF, DOCX, images...Parse
Docling · GLM-OCRExtract
LangExtract · LangChainIngest
pgvector · turbovec opt.RAG Query
6 strategies · streamObserve
OTEL · /health · metrics1. Parse Only
Convert any document to clean text or markdown. Choose Docling or GLM-OCR.
import docpipe
# Default: Docling
doc = docpipe.parse("report.pdf")
# GLM-OCR: state-of-the-art OCR
doc = docpipe.parse("scan.pdf", parser="glm-ocr")
print(doc.markdown)2. Extract Only (LangExtract)
Extract structured entities from any text with LLMs.
schema = docpipe.ExtractionSchema(
description="Extract people and ages",
model_id="gemini-2.5-flash",
)
results = docpipe.extract(text, schema)3. Parse + Extract
Full pipeline: document to structured data in one call.
result = docpipe.run(
"invoice.pdf", schema
)
print(result.extractions)4. Parse + Ingest
Parse a document and ingest vectors into pgvector (default) or local turbovec file indices.
config = docpipe.IngestionConfig(
connection_string="postgresql://...",
table_name="docs",
embedding_provider="openai",
embedding_model="text-embedding-3-small",
)
docpipe.ingest("report.pdf", config=config)
# Optional: local turbovec index (pip install "docpipe-sdk[turbovec]")
# config.vector_backend = "turbovec" # → .docpipe/indices/docs/5. Full Pipeline
Parse, extract, and ingest - all in one call.
result = docpipe.run(
"contract.pdf", schema,
ingestion_config=config,
)6. RAG Query
Ask questions against your ingested documents with grounded answers and source citations.
rag_cfg = docpipe.RAGConfig(
connection_string="postgresql://...",
table_name="docs",
embedding_provider="openai",
embedding_model="text-embedding-3-small",
llm_provider="openai",
llm_model="gpt-4o",
strategy="hyde",
)
result = docpipe.query(
"What is the invoice total?",
config=rag_cfg,
)
print(result.answer) # grounded answer with citations
print(result.sources) # ["invoice.pdf"]
print(result.usage) # TokenUsage when provider reports counts7. Observability
OTLP traces, JSON logs, /health dependency checks, Prometheus /metrics, and token usage on RAG responses.
# pip install "docpipe-sdk[server,observability]"
export DOCPIPE_OTEL_ENABLED=true
export DOCPIPE_OTEL_SERVICE_NAME=docpipe
export DOCPIPE_OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4318/v1/traces
export DOCPIPE_LOG_FORMAT=json
export DOCPIPE_HEALTH_CHECK_DB=true
docpipe serve
curl http://localhost:8000/health # plugins + DB status
curl http://localhost:8000/metrics # Prometheus (no auth on /metrics)6 Retrieval Strategies — Pick What Fits
Switch strategy with one config field. Stream answers, capture token usage, and monitor the API server with OTEL, /health, and /metrics.
Pick your retrieval strategy
Six strategies — hover or tap a node to see when to use it.
RAG
auto
LLM classifies the question and dispatches to the optimal strategy automatically.
When to use: Mixed workloads, zero tuning{ "query": "What is the invoice total?", "strategy": "hyde" }
Standard cosine similarity search. Fast, reliable baseline for well-formed queries.
Best for: well-formed queries, fast responsesLLM generates a hypothetical answer first, embeds it, then retrieves real matching docs. Highest accuracy in benchmarks.
Best for: complex / technical queriesExpands your query into N variants via LLM, retrieves for each, then deduplicates and ranks results.
Best for: vague or short queriesRetrieves seed chunks, then expands context by fetching additional chunks from the same source documents.
Best for: long documents, context coherenceCombines dense vector search with sparse BM25 keyword retrieval via EnsembleRetriever. Best of both worlds.
Best for: exact terms, proper nouns, IDsLLM classifies your question and dispatches to the optimal strategy automatically. Best accuracy with zero tuning.
Best for: mixed workloads, unknown query typesrag_cfg = docpipe.RAGConfig(
...,
strategy="naive",
reranker="flashrank", # local, no API key
rerank_top_n=5,
)
# Retrieve top-50, rerank, keep top-5class Invoice(BaseModel):
total: float
currency: str
result = docpipe.query(
"What is the total?",
config=docpipe.RAGConfig(
..., output_model=Invoice
),
)
invoice = result.structured
# Invoice(total=4250.0, currency='USD')# Stream tokens via SDK or POST /rag/stream (SSE)
for token in docpipe.stream_query(
"What is the total?",
config=rag_config, # stream=True
):
print(token, end="", flush=True)
# Before data: [DONE], optional metadata event:
# event: metadata
# data: {"type":"usage","usage":{"input_tokens":123,...}}result = docpipe.query("Summarize the invoice", config=rag_cfg)
print(result.answer)
if result.usage:
print(result.usage) # input/output/total when provider reports counts
# Same usage object on POST /rag/query JSON responsesBuilt for Production
Everything you need to go from raw documents to grounded answers at scale.
Plugin Architecture
Add custom parsers and extractors via Python entry points. Third-party packages auto-discovered on install.
CLI + API Server
Full CLI for scripting, FastAPI with /health and /metrics, Docker image for deployment. OTEL via [server,observability].
Observability
OTLP traces, JSON logs, /health dependency checks, Prometheus /metrics, and token usage on RAG responses. Install with [server,observability].
Fully Configurable
No magic defaults. Explicit LLM provider, embedding model, and DB connection. YAML + env vars.
LangChain Backbone
Built on LangChain for embeddings, text splitting, and vector stores. Supports OpenAI, Gemini, Ollama, HuggingFace.
Optional Turbovec Backend
Default pgvector in PostgreSQL, or install [turbovec] for compressed on-disk indices — local prototypes, air-gapped RAG, no pgvector required.
20+ Document Formats
PDF, DOCX, XLSX, PPTX, HTML, images — choose between IBM Docling or GLM-OCR (state-of-the-art multimodal OCR).
6 RAG Strategies
naive, HyDE, multi-query, parent-document, hybrid, auto — swap with one config field. Reranking and token usage when providers support it.
Built-in Evaluation
Measure hit rate, MRR, faithfulness, and answer similarity. Know if your RAG is actually working.
Zero Vendor Lock-in
docpipe never stores your data. It connects to your DB, calls your LLM API, then gets out of the way.
Pipeline modes — one card, four shapes
Parse pipeline
Docling or GLM-OCR → markdown
PDF → answer
One document’s journey through the pipeline.
invoice.pdf Line items, tables, headers preserved by Docling.