LycheeMem gives AI agents the memory they need — persistent, structured, and time-aware. Distinguishing episodic experience from semantic knowledge, activated at inference time through a multi-stage reasoning pipeline.
Inspired by the human memory system — distinguishing what happened from what is known and how to do things.
Holds the active conversation window under a dual-threshold token budget. Warn at 70% triggers background pre-compression; block at 90% pauses and flushes older turns to summary anchors.
A Graphiti-style bi-temporal graph stored in Neo4j, modeling the world as Episodes, Entities, Facts, and Communities — each Fact carrying four timestamps to separate real-world validity from system transaction time.
Stores reusable skill entries with intent, full Markdown docs, and dense embeddings. Retrieval uses HyDE — the query is first expanded into a hypothetical ideal answer before embedding, producing high-quality matches against stored procedures.
Every Fact node in the knowledge graph carries four timestamps,
enabling the system to correctly answer "what did the user believe last month?"
even after facts have changed.
Every request flows through a fixed LangGraph pipeline. The consolidation task runs asynchronously after the response is returned.
Token budget check. Appends the user turn, triggers compression if either threshold is crossed. Produces compressed history for downstream.
Decomposes the query into multi-source sub-queries. Dispatches parallel searches to the knowledge graph (BM25 + BFS + ANN) and the skill store (HyDE).
LLM-as-Judge: scores each retrieved fragment 0–1, discards below threshold (default 0.6), fuses survivors into a single dense background context.
Receives compressed history, background context, and skill reuse plan. Generates the final streaming response and appends it to session storage.
Runs in a thread pool after the response is returned. Performs novelty check, ingests entities & facts into Neo4j with bi-temporal timestamps, and extracts new skill entries from the conversation.
Graph search combines three complementary signals, then fuses them via Reciprocal Rank Fusion for robust, diverse recall.
Keyword-level recall against Entity.name and Fact.fact_text via Neo4j full-text index. Fast, precise on exact terms.
Expands outward from the session's most recent episode nodes up to a configurable depth. Surfaces semantically linked facts even with no keyword match.
Approximate nearest-neighbour over Entity.embedding — configurable dimensionality and similarity function (cosine / dot product).
Merges rankings from all three signal lists into a single ordered candidate set without requiring score calibration.
Optional LLM-driven reranker refines the top-N candidates. Driven by the same LiteLLM adapter — no extra SDK.
Maximal Marginal Relevance filters near-duplicate context fragments to ensure variety in the final prompt.
A clean FastAPI server with interactive docs at /docs.
Supports both streaming (SSE) and non-streaming responses.
Query both the knowledge graph and the skill store in a single call. Returns ranked fact results with relevance scores.
top_k, graph & skill togglesfact_id, summary, relevanceTakes raw retrieval results and fuses them into a single dense context using LLM-as-Judge scoring.
background_context, skill_reuse_planprovenance with RRF + BFS/BM25 ranksRuns the ReasoningAgent with a pre-synthesized context. Chain after /synthesize for full pipeline control.
background_context & skill_reuse_planManually trigger memory consolidation for any session. Normally runs automatically in the background after each chat turn.
Three steps to your first memory-augmented agent.
Python 3.11+, Neo4j 5.x with GDS plugin & vector index, and an LLM API key (OpenAI, Gemini, or any litellm-compatible provider).
Clone the repo, install dependencies, then copy .env.example to .env and fill in your LLM model, API key, and Neo4j connection details.
Run python main.py. The FastAPI server starts at http://localhost:8000. Interactive docs at /docs. The React web demo is available under web-demo/.
# Clone & install git clone https://github.com/LycheeMem/LycheeMem cd LycheeMem pip install -e ".[dev]" # Configure .env LLM_MODEL=openai/gpt-4o-mini LLM_API_KEY=sk-... EMBEDDING_MODEL=openai/text-embedding-3-small EMBEDDING_DIM=1536 NEO4J_URI=neo4j://127.0.0.1:7687 NEO4J_USER=neo4j NEO4J_PASSWORD=your_password # Start server python main.py # Run the demo script python examples/api_pipeline_demo.py
A full-featured React + Vite frontend with live inspection of every memory store and pipeline stage.
Full conversation view with per-step trace expansion. Click any pipeline stage to inspect retrieved & scored fragments.
Interactive Neo4j knowledge graph visualization. Search by entity name or relation type, and filter by valid-time range.
Browse and search the procedural skill library. See intent, Markdown docs, usage counts, and last-used timestamps.
Live token usage gauge, compressed summary anchors, and verbatim recent turns for the active session.
cd web-demo npm install npm run dev # → http://localhost:5173
LycheeMem is open-source, training-free, and ready for production. Star the repo and start building.