Module Packages

The retrieval stack is split across optional extras so you don't pay the dependency cost of connectors you don't use. Pick the smallest set that covers what you're building.

Pick by use case

You want to…	Install
Try retrieval end-to-end with everything available	`pip install "railtracks[retrieval]"`
Build a retrieval pipeline against your own loader/store	`pip install "railtracks[retrieval-core]"`
Add a specific source (S3, GCS, Azure Blob, SQL, HuggingFace)	`[retrieval-core]` + the connector extra
OCR scanned PDFs	`[retrieval-core]` + `[ocr]` (plus the Tesseract system binary)
Use a real vector store backend	`[retrieval-core]` + `[stores-vector]` or `[stores-chroma]`

Extras

`[retrieval-core]`

The minimum to run the pipeline. Includes chunking (tiktoken), the basic PDF loader (pypdf), and the InMemoryBackend (numpy). Use this when you're wiring your own loader or store and don't want the full integration set.

pip install "railtracks[retrieval-core]"

`[retrieval]`:

[retrieval-core] plus every built-in connector and backend (OCR, HuggingFace, ChromaDB, AWS, Azure Blob, GCP, SQL, pgvector). Largest install; use when you want everything available.

pip install "railtracks[retrieval]"

Granular extras

Combine these with [retrieval-core] to add specific capabilities.

Extra	Adds	For
`[ocr]`	`pypdfium2`, `pytesseract`, `pillow`	OCR fallback in `PyPDFOCRLoader`. Also needs the Tesseract binary on `PATH`.
`[huggingface]`	`datasets`	`HuggingFaceDatasetLoader` for streaming HF datasets.
`[chroma]`	`chromadb`, `pdfplumber`	ChromaDB-backed ingestion utilities.
`[aws]`	`boto3`	`S3Loader`.
`[azure-blob]`	`azure-core`, `azure-identity`, `azure-storage-blob`	`AzureBlobLoader`.
`[gcp]`	`google-cloud-storage`	`GCSLoader`.
`[sql]`	`sqlalchemy`	`SQLLoader`.

Store backends

The retrieval pipeline ships three vector store backends. The in-memory backend is included in [retrieval-core]; the others are opt-in:

Extra	Backend	Notes
(none)	`InMemoryBackend`	Already in `[retrieval-core]`. Useful for tests and small corpora.
`[stores-vector]`	`PgvectorBackend`	Adds `asyncpg`, `pgvector`. Requires a Postgres instance with the `pgvector` extension.
`[stores-chroma]`	`ChromaBackend`	Adds `chromadb`.
`[stores-all]`	both	Convenience umbrella for the two persistent backends.

System-level dependencies

A few extras need things that aren't on PyPI:

[ocr]: The pytesseract Python wrapper needs the Tesseract binary installed and on PATH. See the Tesseract install guide.
[stores-vector]: asyncpg/pgvector need a running Postgres with the pgvector extension enabled.

What about `[all]`?

railtracks[all] does not include the retrieval stack. It covers the rest of railtracks (visualizer, Portkey integration). Install [retrieval] (or [retrieval-core] + targeted extras) alongside if you need both.