Compact, efficient, and extensible long-term memory for LLM agents. LycheeMem starts from structured conversational memory, adds lightweight consolidation and adaptive retrieval, and now ships with Compact Semantic Memory, OpenClaw integration, and an HTTP MCP endpoint.
The project ships with a working demo experience and a React dashboard for inspecting memory, retrieval, and session state in real time.
LycheeMem keeps the architecture simple: active session context, a compact semantic store for reusable long-term knowledge, and a procedural skill layer for how-to reuse.
Holds the active conversation under a dual-threshold token budget. At 70% it pre-compresses asynchronously; at 90% it blocks, summarizes older turns, and preserves recent turns verbatim.
Long-term knowledge is stored as typed, action-annotated MemoryRecords. The latest design uses SQLite FTS5 + LanceDB, so no Neo4j is required, while retrieval stays fast and deployment stays lightweight.
Skills are stored separately as reusable procedural entries with intent, Markdown docs, and dense embeddings. Retrieval uses HyDE so the system can match a task against an idealized answer before it searches the skill library.
Semantic memory now revolves around MemoryRecord and CompositeRecord:
compact typed entries first, denser fused entries later. That keeps ingestion cheap while still
giving retrieval something more structured than raw conversation logs.
The request path is still explicit and inspectable: manage working memory, plan retrieval, synthesize background context, answer, then consolidate in the background.
Appends the user turn, enforces the token budget, and produces compressed history made of summary anchors plus recent raw turns.
Builds an action-aware search plan, then queries semantic memory through FTS, vectors, tags, and temporal filters while also searching skills with HyDE.
Acts as LLM-as-Judge, scores every fragment on a 0-1 scale, drops low-signal results, and assembles a dense background_context.
Uses compressed history, fused memory context, and the skill reuse plan to generate the final answer and write it back to the session.
Runs after the reply returns. It performs novelty checking, ingests typed semantic records into SQLite + LanceDB, triggers Record Fusion, and extracts new skill entries in parallel.
The latest semantic retrieval no longer depends on graph traversal. It starts from a SearchPlan
with semantic queries, pragmatic queries, tool hints, constraints, and time filters.
SQLite FTS5 handles exact lexical recall while LanceDB runs semantic and normalized ANN search over compact memory entries.
Tool hints, constraint tags, failure tags, and affordance tags let the planner surface action-relevant memory instead of generic semantic similarity only.
Temporal filters narrow semantic memory by time window, while HyDE retrieves complementary skill documents for executable how-to context.
Planner output includes mode, semantic and pragmatic queries, likely tools, required constraints, and missing slots.
Candidates are ranked with a richer scorer before synthesis, giving more weight to action fit and reusable composite records.
Related fragments can be fused online into denser records, and those fused entries are ranked above their raw source fragments.
The FastAPI backend exposes a small set of composable endpoints. Use them directly, or go through the MCP server and OpenClaw plugin layers.
Query semantic memory and the skill store together. Returns graph-style memory results, skill hits, and ranked provenance.
top_k, semantic memory, and skill togglesgraph_results and skill_results/memory/synthesizeTakes raw retrieval output and produces fused memory context with provenance and an explicit skill reuse plan.
background_context, skill_reuse_planRuns the final reasoning stage with pre-synthesized context and can optionally append the reply into session history.
background_context and skill_reuse_planManually persist long-term memory for a session. Normally this happens automatically after each chat turn or at session boundaries.
The latest setup is lighter than before: Python, an LLM key, and local semantic storage paths.
Python 3.11+ and an LLM API key. Supported models follow litellm format, including OpenAI, Gemini, Ollama chat, and OpenAI-compatible endpoints.
Clone the repo, install dependencies, then copy .env.example to .env and fill in your LLM and embedding config plus the compact storage paths.
Run python main.py or python main.py --reload. The API is served at http://localhost:8000, with docs at /docs and MCP at /mcp.
# Clone & install git clone https://github.com/LycheeMem/LycheeMem.git cd LycheeMem pip install -e ".[dev]" # Configure .env LLM_MODEL=openai/gpt-4o-mini LLM_API_KEY=sk-... LLM_API_BASE= EMBEDDING_MODEL=openai/text-embedding-3-small EMBEDDING_DIM=1536 COMPACT_MEMORY_DB_PATH=data/compact_memory.db COMPACT_VECTOR_DB_PATH=data/compact_vector # Start server python main.py python main.py --reload # Run the demo script python examples/api_pipeline_demo.py
The bundled frontend shows chat, semantic memory, skills, and working memory side by side.
Chat with the backend and inspect retrieval traces, background context, and the effect of memory on final answers.
Browse compact memory entries instead of a graph database. This reflects the latest SQLite + LanceDB design in the README.
Inspect the skill library, including reusable procedural docs that can be pulled into reasoning through HyDE retrieval.
See token usage, recent raw turns, and summary anchors for the active session while the dual-threshold policy is running.
cd web-demo npm install npm run dev # → http://localhost:5173
The latest README adds an OpenClaw plugin so LycheeMem can become long-term memory for OpenClaw sessions with minimal manual wiring.
lychee_memory_smart_search is the default long-term recall tool for the model during normal operation.
User and assistant turns are mirrored automatically via hooks, so the model does not need to call append-turn APIs manually.
/new, /reset, /stop, and session_end can trigger consolidation, with proactive persistence on strong long-term signals.
openclaw plugins install "/path/to/LycheeMem/openclaw-plugin" openclaw gateway restart
LycheeMem now exposes an MCP endpoint at http://localhost:8000/mcp for remote clients that support HTTP transport.
Handles MCP JSON-RPC requests including initialize, tools/list, and tools/call.
Mcp-Session-Id after initializeSupports clients that expect the MCP event stream over server-sent events on the same route.
LycheeMem is open source, lighter to deploy than the older graph-based design, and now ready for direct API, MCP, and OpenClaw workflows.