LYCHEEMEM

Compact, efficient, and extensible long-term memory for LLM agents. LycheeMem starts from structured conversational memory, adds lightweight consolidation and adaptive retrieval, and now ships with Compact Semantic Memory, OpenClaw integration, and an HTTP MCP endpoint.

7 Types
Semantic memory categories
5 Channels
Action-aware semantic retrieval
MCP + Plugin
OpenClaw and remote tool access

See Long-Term Memory In Action

The project ships with a working demo experience and a React dashboard for inspecting memory, retrieval, and session state in real time.

Three Complementary Memory Stores

LycheeMem keeps the architecture simple: active session context, a compact semantic store for reusable long-term knowledge, and a procedural skill layer for how-to reuse.

🧠
Working Memory · Episodic

Session Context

Holds the active conversation under a dual-threshold token budget. At 70% it pre-compresses asynchronously; at 90% it blocks, summarizes older turns, and preserves recent turns verbatim.

Dual-threshold budget Summary anchors Verbatim recent turns
🕸️
Semantic Memory · Compact Store

Compact Semantic Memory

Long-term knowledge is stored as typed, action-annotated MemoryRecords. The latest design uses SQLite FTS5 + LanceDB, so no Neo4j is required, while retrieval stays fast and deployment stays lightweight.

SQLite FTS5 LanceDB vectors Record Fusion Usage statistics
⚙️
Procedural Memory · Skills

How-to Knowledge

Skills are stored separately as reusable procedural entries with intent, Markdown docs, and dense embeddings. Retrieval uses HyDE so the system can match a task against an idealized answer before it searches the skill library.

HyDE retrieval File-based store Intent embeddings

Compact Semantic Memory at a Glance

Semantic memory now revolves around MemoryRecord and CompositeRecord: compact typed entries first, denser fused entries later. That keeps ingestion cheap while still giving retrieval something more structured than raw conversation logs.

7 record types
fact / preference / event / constraint / procedure / failure_pattern / tool_affordance
Storage layer
SQLite FTS5 + LanceDB with retrieval_count and action_success_count stats

Four Synchronous Stages + One Background Task

The request path is still explicit and inspectable: manage working memory, plan retrieval, synthesize background context, answer, then consolidate in the background.

1

WMManager

Appends the user turn, enforces the token budget, and produces compressed history made of summary anchors plus recent raw turns.

2

SearchCoordinator

Builds an action-aware search plan, then queries semantic memory through FTS, vectors, tags, and temporal filters while also searching skills with HyDE.

3

SynthesizerAgent

Acts as LLM-as-Judge, scores every fragment on a 0-1 scale, drops low-signal results, and assembles a dense background_context.

4

ReasoningAgent

Uses compressed history, fused memory context, and the skill reuse plan to generate the final answer and write it back to the session.

BACKGROUND

ConsolidatorAgent — asyncio.create_task

Runs after the reply returns. It performs novelty checking, ingests typed semantic records into SQLite + LanceDB, triggers Record Fusion, and extracts new skill entries in parallel.

Action-Aware Planning with Five Parallel Recall Channels

The latest semantic retrieval no longer depends on graph traversal. It starts from a SearchPlan with semantic queries, pragmatic queries, tool hints, constraints, and time filters.

SIGNAL 01

FTS + Vector Recall

SQLite FTS5 handles exact lexical recall while LanceDB runs semantic and normalized ANN search over compact memory entries.

SIGNAL 02

Tag and Constraint Filters

Tool hints, constraint tags, failure tags, and affordance tags let the planner surface action-relevant memory instead of generic semantic similarity only.

SIGNAL 03

Temporal + Skill Recall

Temporal filters narrow semantic memory by time window, while HyDE retrieves complementary skill documents for executable how-to context.

SearchPlan

Planner output includes mode, semantic and pragmatic queries, likely tools, required constraints, and missing slots.

Six-dimensional scoring

Candidates are ranked with a richer scorer before synthesis, giving more weight to action fit and reusable composite records.

Composite records first

Related fragments can be fused online into denser records, and those fused entries are ranked above their raw source fragments.

Core Memory APIs

The FastAPI backend exposes a small set of composable endpoints. Use them directly, or go through the MCP server and OpenClaw plugin layers.

POST /memory/search

Unified Memory Retrieval

Query semantic memory and the skill store together. Returns graph-style memory results, skill hits, and ranked provenance.

  • Supports top_k, semantic memory, and skill toggles
  • Returns graph_results and skill_results
  • Designed to feed directly into /memory/synthesize
POST /memory/synthesize

Memory Fusion

Takes raw retrieval output and produces fused memory context with provenance and an explicit skill reuse plan.

  • 0–1 relevance scoring per fragment
  • Returns background_context, skill_reuse_plan
  • Includes kept vs dropped fragment counts
POST /memory/reason

Grounded Reasoning

Runs the final reasoning stage with pre-synthesized context and can optionally append the reply into session history.

  • Accepts background_context and skill_reuse_plan
  • Optional session write-back
  • Returns token usage stats
POST /memory/consolidate/{session_id}

Trigger Consolidation

Manually persist long-term memory for a session. Normally this happens automatically after each chat turn or at session boundaries.

  • Novelty check before writing
  • Writes compact records to SQLite + LanceDB
  • Extract & store new skill entries
POST /chat Session write path

End-to-End Chat

Use /chat when you want the full pipeline with stable session_id handling. The README also recommends it as the write path before MCP-side retrieval and consolidation.

Up and Running in Minutes

The latest setup is lighter than before: Python, an LLM key, and local semantic storage paths.

1

Prerequisites

Python 3.11+ and an LLM API key. Supported models follow litellm format, including OpenAI, Gemini, Ollama chat, and OpenAI-compatible endpoints.

2

Install & Configure

Clone the repo, install dependencies, then copy .env.example to .env and fill in your LLM and embedding config plus the compact storage paths.

3

Start the Server

Run python main.py or python main.py --reload. The API is served at http://localhost:8000, with docs at /docs and MCP at /mcp.

Setup
# Clone & install
git clone https://github.com/LycheeMem/LycheeMem.git
cd LycheeMem
pip install -e ".[dev]"

# Configure .env
LLM_MODEL=openai/gpt-4o-mini
LLM_API_KEY=sk-...
LLM_API_BASE=
EMBEDDING_MODEL=openai/text-embedding-3-small
EMBEDDING_DIM=1536
COMPACT_MEMORY_DB_PATH=data/compact_memory.db
COMPACT_VECTOR_DB_PATH=data/compact_vector

# Start server
python main.py
python main.py --reload

# Run the demo script
python examples/api_pipeline_demo.py

Built-in React Dashboard

The bundled frontend shows chat, semantic memory, skills, and working memory side by side.

💬

Chat

Chat with the backend and inspect retrieval traces, background context, and the effect of memory on final answers.

🗂️

Semantic Memory

Browse compact memory entries instead of a graph database. This reflects the latest SQLite + LanceDB design in the README.

⚙️

Skills

Inspect the skill library, including reusable procedural docs that can be pulled into reasoning through HyDE retrieval.

📊

Working Memory

See token usage, recent raw turns, and summary anchors for the active session while the dual-threshold policy is running.

Start the web demo
cd web-demo
npm install
npm run dev   # → http://localhost:5173

Native OpenClaw Integration

The latest README adds an OpenClaw plugin so LycheeMem can become long-term memory for OpenClaw sessions with minimal manual wiring.

1

Smart search entry point

lychee_memory_smart_search is the default long-term recall tool for the model during normal operation.

2

Automatic turn mirroring

User and assistant turns are mirrored automatically via hooks, so the model does not need to call append-turn APIs manually.

3

Boundary-aware consolidation

/new, /reset, /stop, and session_end can trigger consolidation, with proactive persistence on strong long-term signals.

Quick Install
openclaw plugins install "/path/to/LycheeMem/openclaw-plugin"
openclaw gateway restart

Remote Memory Access over HTTP MCP

LycheeMem now exposes an MCP endpoint at http://localhost:8000/mcp for remote clients that support HTTP transport.

POST /mcp

JSON-RPC transport

Handles MCP JSON-RPC requests including initialize, tools/list, and tools/call.

  • Reuse Mcp-Session-Id after initialize
  • Works with remote HTTP MCP clients
  • Fits the recommended read-side integration flow
GET /mcp

SSE stream endpoint

Supports clients that expect the MCP event stream over server-sent events on the same route.

  • Same endpoint path as JSON-RPC
  • Used by some MCP client implementations
  • Pairs with session-id reuse
TOOLS lychee_memory_search · lychee_memory_synthesize · lychee_memory_consolidate

Recommended MCP pattern

Write turns through /chat or /memory/reason with a stable session_id, retrieve with lychee_memory_search, synthesize into background_context, then consolidate at conversation end.

Build Agents With Compact Long-Term Memory

LycheeMem is open source, lighter to deploy than the older graph-based design, and now ready for direct API, MCP, and OpenClaw workflows.