Store backends
VectorStore delegates all index I/O to a backend implementing the
VectorBackend protocol. Three backends ship; pick by where you are on
the dev → production curve.
class VectorBackend(Protocol):
async def upsert(self, id: str, vector: list[float], payload: dict) -> None: ...
async def search(
self, vector: list[float], top_k: int, filters: dict
) -> list[tuple[str, float, dict]]: ...
async def delete(self, id: str) -> None: ...
async def delete_where(self, filters: dict) -> None: ...
Pick by infra, not features. All three backends speak the same
VectorBackend protocol; runtime.retrieve() behaves the same against
all of them. The choice is about where you want the index to live.
Initialization
ChromaBackend and PgvectorBackend must be initialised before use.
Both expose an async create(...) factory that constructs and initialises
in one call.
Prefer create() over manual __init__ + initialize():
from railtracks.retrieval.stores import PgvectorBackend
async def pgvector():
backend = await PgvectorBackend.create(
dsn="postgresql://...", table="my_index", dim=1536
)
initialize() is still available for cases where construction must stay
synchronous (dependency injection containers, etc.):
async def pgvector_init():
backend = PgvectorBackend(dsn="postgresql://...")
await backend.initialize()
InMemoryVectorBackend requires neither; ready immediately after
construction.
InMemoryVectorBackend
Fully in-process backend backed by a Python dict. No external dependencies. Cosine similarity is computed in pure Python.
from railtracks.retrieval.stores import InMemoryVectorBackend, VectorStore
store = VectorStore(InMemoryVectorBackend())
Distance metric
Same DistanceMetric enum as the other backends:
from railtracks.retrieval.stores import DistanceMetric, InMemoryVectorBackend
backend = InMemoryVectorBackend(metric=DistanceMetric.L2)
DistanceMetric |
Score formula |
|---|---|
COSINE (default) |
cosine_similarity(q, v) |
L2 |
1 / (1 + ‖q - v‖) |
IP |
q · v (raw dot product) |
Snapshots
Snapshots persist the index between process restarts without external
infrastructure. Pass snapshot_path and the store is saved to disk as
JSON after every write or delete:
from pathlib import Path
from railtracks.retrieval.stores import InMemoryVectorBackend, VectorStore
store = VectorStore(InMemoryVectorBackend(snapshot_path=Path("index.json")))
# The file is loaded automatically on next construction
store2 = VectorStore(InMemoryVectorBackend(snapshot_path=Path("index.json")))
| Property | Value |
|---|---|
| Install | No extra dependencies |
| Persistence | Optional JSON snapshot |
| Distance metrics | COSINE, L2, IP |
| Suitable for | Development, tests, small corpora |
When to use: unit tests, demos, small (<100k chunks) single-process workloads. If you're snapshotting more than once a second, you've outgrown this backend.
ChromaBackend
Backend powered by Chroma. Supports ephemeral (in-process), persistent (on-disk), and HTTP (remote server) client modes.
from railtracks.retrieval.stores import ChromaBackend, VectorStore
async def chroma():
# Prefer create(): constructs and initialises in one step
backend = await ChromaBackend.create("my-collection")
store = VectorStore(backend)
Client modes
| Mode | When | Configuration |
|---|---|---|
| Ephemeral | No path or host given |
In-process, data lost on exit |
| Persistent | path="/path/to/dir" |
Data persisted to disk |
| HTTP | host="localhost", port=8000 |
Remote Chroma server |
# Persistent on-disk
backend = ChromaBackend("my-collection", path="/data/chroma")
# Remote server
backend = ChromaBackend("my-collection", host="chroma.internal", port=8000)
Distance metric
Chroma's hnsw:space is set at collection creation and cannot be
changed later. Pick at create time:
from railtracks.retrieval.stores import ChromaBackend, DistanceMetric
backend = ChromaBackend("my-collection", metric=DistanceMetric.L2)
DistanceMetric |
Chroma space | Score formula |
|---|---|---|
COSINE (default) |
cosine |
1 - distance |
L2 |
l2 |
1 / (1 + sqrt(distance)) |
IP |
ip |
1 - distance |
| Property | Value |
|---|---|
| Install | pip install "railtracks[stores-chroma]" |
| Persistence | Via client mode |
| Distance metrics | COSINE, L2, IP |
| Suitable for | Prototyping, moderate corpora, managed Chroma Cloud |
When to use: standalone apps where you want a real vector index without standing up Postgres, or when you already use Chroma Cloud.
PgvectorBackend
Stores entries in a Postgres table with a pgvector column. Requires
asyncpg and the pgvector Postgres extension.
from railtracks.retrieval.stores import PgvectorBackend
async def pgvector():
backend = await PgvectorBackend.create(
dsn="postgresql://...", table="my_index", dim=1536
)
initialize() (called by create()) runs CREATE EXTENSION IF NOT EXISTS
vector and CREATE TABLE IF NOT EXISTS; safe to call on every startup.
Dimensionality
If you know the embedding dimension upfront, pass dim: it enables
Postgres's typed vector(N) column and lets pgvector use ivfflat or
hnsw indexes:
backend = PgvectorBackend(
dsn="postgresql://user:pass@localhost/mydb",
dim=1536, # e.g. text-embedding-3-small
)
Without dim, the column is an untyped vector. Queries still work but
can't use the fast index types; fine for development, might cause issues in
production once the table grows.
Distance metric
from railtracks.retrieval.stores import DistanceMetric, PgvectorBackend
backend = PgvectorBackend(dsn="...", metric=DistanceMetric.IP)
DistanceMetric |
SQL operator | Score formula |
|---|---|---|
COSINE (default) |
<=> |
1 - distance |
L2 |
<-> |
1 / (1 + distance) |
IP |
<#> |
-distance (dot product) |
| Property | Value |
|---|---|
| Install | pip install "railtracks[stores-vector]" |
| Persistence | Full Postgres durability |
| Distance metrics | COSINE, L2, IP |
| Suitable for | Production, existing Postgres stacks |
When to use: anywhere you already run Postgres. One backup story, one permissions story, one connection pool. The right default for production unless you have a strong reason for a managed vector DB.
Choosing a backend
| InMemory | Chroma | Pgvector | |
|---|---|---|---|
| Extra install | None | stores-chroma |
stores-vector |
| Persistence | Optional snapshot | Client-dependent | Postgres |
| Scale | Small (in-process) | Medium–Large | Large |
| Infrastructure | None | Chroma server (optional) | Postgres + pgvector |
| Best for | Tests & dev | Standalone apps | Postgres-native stacks |
Custom backends
Any class satisfying the VectorBackend protocol works with VectorStore.
Four methods (upsert, search, delete, delete_where) are the
entire contract.
from railtracks.retrieval.stores import VectorStore
class MyBackend:
async def upsert(self, id: str, vector: list[float], payload: dict) -> None: ...
async def search(
self, vector: list[float], top_k: int, filters: dict
) -> list[tuple[str, float, dict]]: ...
async def delete(self, id: str) -> None: ...
async def delete_where(self, filters: dict) -> None: ...
async def list_where(self, filters: dict, limit: int) -> list[tuple[str, dict]]: ...
async def count(self, filters: dict) -> int: ...
store = VectorStore(MyBackend())
filters is a flat dict[str, str] built from
StoreScope.to_payload_filters() plus any metadata_filters from the
query. All keys must match for an entry to be returned; there's no
boolean logic at the backend layer.