#1 on LoCoMo benchmark — zero LLM required

Conversational memory
that actually remembers.

State-of-the-art retrieval over past conversations — 93.9% R@5 on LoCoMo, 98.4% on LongMemEval. No LLM calls. $0 per query. Your words, stored exactly as you said them.

$ pip install engram-search
View on GitHub
MIT licensed
Local-first, cloud-ready
Python 3.9+
Benchmarks

Independently verified on two benchmarks.

Tested on the two widely-used conversational memory benchmarks. No LLM in the loop — just embeddings, sparse retrieval, and a free cross-encoder reranker.

LoCoMo

1,982 questions · 10 conversations
93.9%
R@5 — top result on the benchmark
R@1095.0%
NDCG@50.894
Single-hop90.4%
Temporal93.1%
Contextual97.1%
Adversarial94.6%

LongMemEval

500 questions
98.4%
R@5 — 492 of 500 questions retrieved
R@1099.4%
NDCG@50.934
Multi-session99.2%
Single-session-user100.0%
Knowledge-update98.7%
Temporal-reasoning97.0%

How Engram compares on LoCoMo

Higher is better · zero-LLM systems marked No
System LoCoMo R@5 LLM required Cost / query
Engram 93.9% No $0
EverMemOS 92.3% Yes (cloud) $$
Hindsight 89.6% Yes (cloud) $$
Zep ~85% Yes (cloud) $$
Letta / MemGPT ~83.2% Yes (cloud) $$
SLM V3 (zero-cloud) 74.8% No $0
Supermemory ~70% Yes $$
Mem0 (independent) ~58% Yes $$
Architecture

Three-stage hybrid retrieval.

Dense semantic search catches meaning. Sparse BM25 catches exact words. A cross-encoder reranker scores the finalists. Nothing is summarized.

1

Dense

bge-large bi-encoder (1024d) finds semantically similar past turns.

2

Sparse

BM25 catches exact names, dates, and rare terms embeddings miss.

3

RRF fusion

Reciprocal Rank Fusion combines both signals without per-query tuning.

4

Rerank

Cross-encoder scores top candidates jointly for the final ranking.

Session chunking

Long sessions dilute embeddings. Chunking at ~6 turns with 1-turn overlap keeps individual facts retrievable.

Timestamp prefix

Prepending [2024-01-15] to each document lets both dense and BM25 match temporal queries.

Speaker-name injection

First-person turns don't contain the speaker's name, so entity-attribute queries fail. Prepending it bridges the gap and lifts LoCoMo R@5 by ~3pts.

Quickstart

Running in two minutes.

One pip install. Works locally with FAISS + SQLite, or plugs into Qdrant for cloud deployment.

# Install
$ pip install engram-search

# Initialize a memory store
$ engram init ./my_memories

# Ingest past conversations
$ engram ingest conversations.json --store ./my_memories

# Search
$ engram search "why did we switch to GraphQL" --store ./my_memories
from engram.backends.faiss_backend import FaissBackend
from engram.backends.base import Document
from engram.ingestion.parser import session_to_documents
from engram.retrieval.embedder import Embedder
from engram.retrieval.pipeline import RetrievalPipeline

embedder = Embedder("bge-large")
backend = FaissBackend(path="./my_memories", dimension=1024)
pipeline = RetrievalPipeline(embedder=embedder)

turns = [
    {"role": "user", "content": "I'm switching our API from REST to GraphQL."},
    {"role": "assistant", "content": "What's driving the switch?"},
    {"role": "user", "content": "Too many round trips — 12 calls per screen."},
]
docs = session_to_documents(turns, session_id="s1", timestamp="2025-01-15")

results = pipeline.search("why did we switch to GraphQL", documents=docs, top_k=3)
for r in results:
    print(r.text)
# Point Engram at a managed Qdrant cluster
$ export ENGRAM_BACKEND=qdrant
$ export ENGRAM_QDRANT_URL=https://your-cluster.qdrant.io:6333
$ export ENGRAM_QDRANT_API_KEY=your-api-key

# Start the API server
$ pip install fastapi uvicorn
$ uvicorn engram.server:app --host 0.0.0.0 --port 8000

# Endpoints available
# POST /ingest   — add conversations
# POST /search   — retrieve memories
# GET  /health   — health check
# GET  /stats    — store statistics
Why Engram

Built for agents that need real memory.

Zero LLM calls

Retrieval only. Deterministic, reproducible, no per-query spend, no prompt drift, no rate limits.

Exact words preserved

Nothing is summarized or paraphrased on the way in. What you said is what gets returned.

Local-first

FAISS + SQLite out of the box. Runs entirely on your machine. No API keys needed to get started.

Cloud-ready

Plug into Qdrant for multi-tenant, horizontally-scalable memory. Same API, same accuracy.

Ready to give your agent a memory?

MIT licensed. Reproducible benchmarks. Drop it into your RAG pipeline today.

$ pip install engram-search
Star on GitHub