LYCHEEMEM

LycheeMem gives AI agents the memory they need — persistent, structured, and time-aware. Distinguishing episodic experience from semantic knowledge, activated at inference time through a multi-stage reasoning pipeline.

3-Tier
Cognitive Memory Architecture
5-Stage
Reasoning Pipeline
Retrieval Signal Fusion

LycheeMem vs ChatGPT

Three Complementary Memory Stores

Inspired by the human memory system — distinguishing what happened from what is known and how to do things.

🧠
Working Memory · Episodic

Session Context

Holds the active conversation window under a dual-threshold token budget. Warn at 70% triggers background pre-compression; block at 90% pauses and flushes older turns to summary anchors.

Dual-threshold budget Summary anchors Verbatim recent turns
🕸️
Semantic Memory · Knowledge Graph

Bi-temporal Graph

A Graphiti-style bi-temporal graph stored in Neo4j, modeling the world as Episodes, Entities, Facts, and Communities — each Fact carrying four timestamps to separate real-world validity from system transaction time.

Neo4j backend Bi-temporal facts Community detection Episode anchors
⚙️
Procedural Memory · Skills

How-to Knowledge

Stores reusable skill entries with intent, full Markdown docs, and dense embeddings. Retrieval uses HyDE — the query is first expanded into a hypothetical ideal answer before embedding, producing high-quality matches against stored procedures.

HyDE retrieval File-based store Intent embeddings

🕐 Bi-temporal Model — Time-Aware Fact Tracking

Every Fact node in the knowledge graph carries four timestamps, enabling the system to correctly answer "what did the user believe last month?" even after facts have changed.

Valid Time (real world)
t_valid_from / t_valid_to
Transaction Time (system)
t_tx_created / t_tx_expired

Four Synchronous Stages + One Background Task

Every request flows through a fixed LangGraph pipeline. The consolidation task runs asynchronously after the response is returned.

1

WMManager

Token budget check. Appends the user turn, triggers compression if either threshold is crossed. Produces compressed history for downstream.

2

SearchCoordinator

Decomposes the query into multi-source sub-queries. Dispatches parallel searches to the knowledge graph (BM25 + BFS + ANN) and the skill store (HyDE).

3

SynthesizerAgent

LLM-as-Judge: scores each retrieved fragment 0–1, discards below threshold (default 0.6), fuses survivors into a single dense background context.

4

ReasoningAgent

Receives compressed history, background context, and skill reuse plan. Generates the final streaming response and appends it to session storage.

BACKGROUND

ConsolidatorAgent — asyncio.create_task

Runs in a thread pool after the response is returned. Performs novelty check, ingests entities & facts into Neo4j with bi-temporal timestamps, and extracts new skill entries from the conversation.

Four-Signal Fusion with RRF Re-ranking

Graph search combines three complementary signals, then fuses them via Reciprocal Rank Fusion for robust, diverse recall.

SIGNAL 01

BM25 Full-text Search

Keyword-level recall against Entity.name and Fact.fact_text via Neo4j full-text index. Fast, precise on exact terms.

SIGNAL 02

BFS Graph Traversal

Expands outward from the session's most recent episode nodes up to a configurable depth. Surfaces semantically linked facts even with no keyword match.

SIGNAL 03

Vector ANN Search

Approximate nearest-neighbour over Entity.embedding — configurable dimensionality and similarity function (cosine / dot product).

⚡ Reciprocal Rank Fusion (RRF)

Merges rankings from all three signal lists into a single ordered candidate set without requiring score calibration.

🎯 Cross-encoder Reranker

Optional LLM-driven reranker refines the top-N candidates. Driven by the same LiteLLM adapter — no extra SDK.

🔀 MMR Diversification

Maximal Marginal Relevance filters near-duplicate context fragments to ensure variety in the final prompt.

Five Core Endpoints

A clean FastAPI server with interactive docs at /docs. Supports both streaming (SSE) and non-streaming responses.

POST /memory/search

Unified Memory Retrieval

Query both the knowledge graph and the skill store in a single call. Returns ranked fact results with relevance scores.

  • BM25 + BFS + ANN fusion via RRF
  • Configurable top_k, graph & skill toggles
  • Returns fact_id, summary, relevance
POST /memory/synthesize

Memory Fusion

Takes raw retrieval results and fuses them into a single dense context using LLM-as-Judge scoring.

  • 0–1 relevance scoring per fragment
  • Returns background_context, skill_reuse_plan
  • Full provenance with RRF + BFS/BM25 ranks
POST /memory/reason

Grounded Reasoning

Runs the ReasoningAgent with a pre-synthesized context. Chain after /synthesize for full pipeline control.

  • Accepts background_context & skill_reuse_plan
  • Optional session write-back
  • Returns token usage stats
POST /memory/consolidate/{session_id}

Trigger Consolidation

Manually trigger memory consolidation for any session. Normally runs automatically in the background after each chat turn.

  • Novelty check before writing
  • Ingest entities & facts to Neo4j
  • Extract & store new skill entries
POST /chat/complete Full Pipeline · Streaming SSE

End-to-End Chat

Runs the full 4-stage pipeline (WMManager → SearchCoordinator → SynthesizerAgent → ReasoningAgent) with a single API call. Supports streaming via Server-Sent Events for real-time token output. ConsolidatorAgent fires automatically in the background after each response.

Up and Running in Minutes

Three steps to your first memory-augmented agent.

1

Prerequisites

Python 3.11+, Neo4j 5.x with GDS plugin & vector index, and an LLM API key (OpenAI, Gemini, or any litellm-compatible provider).

2

Install & Configure

Clone the repo, install dependencies, then copy .env.example to .env and fill in your LLM model, API key, and Neo4j connection details.

3

Start the Server

Run python main.py. The FastAPI server starts at http://localhost:8000. Interactive docs at /docs. The React web demo is available under web-demo/.

Setup
# Clone & install
git clone https://github.com/LycheeMem/LycheeMem
cd LycheeMem
pip install -e ".[dev]"

# Configure .env
LLM_MODEL=openai/gpt-4o-mini
LLM_API_KEY=sk-...
EMBEDDING_MODEL=openai/text-embedding-3-small
EMBEDDING_DIM=1536
NEO4J_URI=neo4j://127.0.0.1:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your_password

# Start server
python main.py

# Run the demo script
python examples/api_pipeline_demo.py

Built-in React Dashboard

A full-featured React + Vite frontend with live inspection of every memory store and pipeline stage.

💬

Chat

Full conversation view with per-step trace expansion. Click any pipeline stage to inspect retrieved & scored fragments.

🕸️

Graph Memory

Interactive Neo4j knowledge graph visualization. Search by entity name or relation type, and filter by valid-time range.

⚙️

Skills

Browse and search the procedural skill library. See intent, Markdown docs, usage counts, and last-used timestamps.

📊

Working Memory

Live token usage gauge, compressed summary anchors, and verbatim recent turns for the active session.

Start the web demo
cd web-demo
npm install
npm run dev   # → http://localhost:5173

Give Your Agents Real Memory

LycheeMem is open-source, training-free, and ready for production. Star the repo and start building.