Store backends

VectorStore delegates all index I/O to a backend implementing the VectorBackend protocol. Three backends ship; pick by where you are on the dev → production curve.

class VectorBackend(Protocol):
    async def upsert(self, id: str, vector: list[float], payload: dict) -> None: ...
    async def search(
        self, vector: list[float], top_k: int, filters: dict
    ) -> list[tuple[str, float, dict]]: ...
    async def delete(self, id: str) -> None: ...
    async def delete_where(self, filters: dict) -> None: ...

Pick by infra, not features. All three backends speak the same VectorBackend protocol; runtime.retrieve() behaves the same against all of them. The choice is about where you want the index to live.

Initialization

ChromaBackend and PgvectorBackend must be initialised before use. Both expose an async create(...) factory that constructs and initialises in one call.

Prefer create() over manual __init__ + initialize():

from railtracks.retrieval.stores import PgvectorBackend


async def pgvector():
    backend = await PgvectorBackend.create(
        dsn="postgresql://...", table="my_index", dim=1536
    )

initialize() is still available for cases where construction must stay synchronous (dependency injection containers, etc.):

async def pgvector_init():
    backend = PgvectorBackend(dsn="postgresql://...")
    await backend.initialize()

InMemoryVectorBackend requires neither; ready immediately after construction.

`InMemoryVectorBackend`

Fully in-process backend backed by a Python dict. No external dependencies. Cosine similarity is computed in pure Python.

from railtracks.retrieval.stores import InMemoryVectorBackend, VectorStore

store = VectorStore(InMemoryVectorBackend())

Distance metric

Same DistanceMetric enum as the other backends:

from railtracks.retrieval.stores import DistanceMetric, InMemoryVectorBackend

backend = InMemoryVectorBackend(metric=DistanceMetric.L2)

`DistanceMetric`	Score formula
`COSINE` (default)	`cosine_similarity(q, v)`
`L2`	`1 / (1 + ‖q - v‖)`
`IP`	`q · v` (raw dot product)

Snapshots

Snapshots persist the index between process restarts without external infrastructure. Pass snapshot_path and the store is saved to disk as JSON after every write or delete:

    from pathlib import Path

    from railtracks.retrieval.stores import InMemoryVectorBackend, VectorStore

    store = VectorStore(InMemoryVectorBackend(snapshot_path=Path("index.json")))

    # The file is loaded automatically on next construction
    store2 = VectorStore(InMemoryVectorBackend(snapshot_path=Path("index.json")))

Property	Value
Install	No extra dependencies
Persistence	Optional JSON snapshot
Distance metrics	COSINE, L2, IP
Suitable for	Development, tests, small corpora

When to use: unit tests, demos, small (<100k chunks) single-process workloads. If you're snapshotting more than once a second, you've outgrown this backend.

`ChromaBackend`

Backend powered by Chroma. Supports ephemeral (in-process), persistent (on-disk), and HTTP (remote server) client modes.

pip install "railtracks[stores-chroma]"

from railtracks.retrieval.stores import ChromaBackend, VectorStore


async def chroma():
    # Prefer create(): constructs and initialises in one step
    backend = await ChromaBackend.create("my-collection")
    store = VectorStore(backend)

Client modes

Mode	When	Configuration
Ephemeral	No `path` or `host` given	In-process, data lost on exit
Persistent	`path="/path/to/dir"`	Data persisted to disk
HTTP	`host="localhost", port=8000`	Remote Chroma server

    # Persistent on-disk
    backend = ChromaBackend("my-collection", path="/data/chroma")

    # Remote server
    backend = ChromaBackend("my-collection", host="chroma.internal", port=8000)

Distance metric

Chroma's hnsw:space is set at collection creation and cannot be changed later. Pick at create time:

    from railtracks.retrieval.stores import ChromaBackend, DistanceMetric

    backend = ChromaBackend("my-collection", metric=DistanceMetric.L2)

`DistanceMetric`	Chroma space	Score formula
`COSINE` (default)	`cosine`	`1 - distance`
`L2`	`l2`	`1 / (1 + sqrt(distance))`
`IP`	`ip`	`1 - distance`

Property	Value
Install	`pip install "railtracks[stores-chroma]"`
Persistence	Via client mode
Distance metrics	COSINE, L2, IP
Suitable for	Prototyping, moderate corpora, managed Chroma Cloud

When to use: standalone apps where you want a real vector index without standing up Postgres, or when you already use Chroma Cloud.

`PgvectorBackend`

Stores entries in a Postgres table with a pgvector column. Requires asyncpg and the pgvector Postgres extension.

pip install "railtracks[stores-vector]"

from railtracks.retrieval.stores import PgvectorBackend


async def pgvector():
    backend = await PgvectorBackend.create(
        dsn="postgresql://...", table="my_index", dim=1536
    )

initialize() (called by create()) runs CREATE EXTENSION IF NOT EXISTS vector and CREATE TABLE IF NOT EXISTS; safe to call on every startup.

Dimensionality

If you know the embedding dimension upfront, pass dim: it enables Postgres's typed vector(N) column and lets pgvector use ivfflat or hnsw indexes:

    backend = PgvectorBackend(
        dsn="postgresql://user:pass@localhost/mydb",
        dim=1536,   # e.g. text-embedding-3-small
    )

Without dim, the column is an untyped vector. Queries still work but can't use the fast index types; fine for development, might cause issues in production once the table grows.

Distance metric

    from railtracks.retrieval.stores import DistanceMetric, PgvectorBackend

    backend = PgvectorBackend(dsn="...", metric=DistanceMetric.IP)

`DistanceMetric`	SQL operator	Score formula
`COSINE` (default)	`<=>`	`1 - distance`
`L2`	`<->`	`1 / (1 + distance)`
`IP`	`<#>`	`-distance` (dot product)

Property	Value
Install	`pip install "railtracks[stores-vector]"`
Persistence	Full Postgres durability
Distance metrics	COSINE, L2, IP
Suitable for	Production, existing Postgres stacks

When to use: anywhere you already run Postgres. One backup story, one permissions story, one connection pool. The right default for production unless you have a strong reason for a managed vector DB.

Choosing a backend

	InMemory	Chroma	Pgvector
Extra install	None	`stores-chroma`	`stores-vector`
Persistence	Optional snapshot	Client-dependent	Postgres
Scale	Small (in-process)	Medium–Large	Large
Infrastructure	None	Chroma server (optional)	Postgres + pgvector
Best for	Tests & dev	Standalone apps	Postgres-native stacks

Custom backends

Any class satisfying the VectorBackend protocol works with VectorStore. Four methods (upsert, search, delete, delete_where) are the entire contract.

from railtracks.retrieval.stores import VectorStore


class MyBackend:
    async def upsert(self, id: str, vector: list[float], payload: dict) -> None: ...

    async def search(
        self, vector: list[float], top_k: int, filters: dict
    ) -> list[tuple[str, float, dict]]: ...

    async def delete(self, id: str) -> None: ...

    async def delete_where(self, filters: dict) -> None: ...

    async def list_where(self, filters: dict, limit: int) -> list[tuple[str, dict]]: ...

    async def count(self, filters: dict) -> int: ...


store = VectorStore(MyBackend())

filters is a flat dict[str, str] built from StoreScope.to_payload_filters() plus any metadata_filters from the query. All keys must match for an entry to be returned; there's no boolean logic at the backend layer.