Lossless Context Management
Infinite conversation history. Bounded context. Zero information loss.
LLMs can only process so many tokens at once. As conversations grow, older context vanishes β until LCM.
Even 128K context windows fill up. When they do, the model forgets β permanently.
Chats accumulate indefinitely. A year of daily use = millions of tokens of history.
Without context, models hallucinate or contradict past decisions and facts.
More history = smarter responses, but costs more tokens and risks overflow.
A Directed Acyclic Graph of compressed conversation history. Each node knows its lineage β where it came from and what it contains.
| Field | Type | Description |
|---|---|---|
| id | string | Unique identifier (e.g. sum_01a, msg_003) |
| content | string | Summary text or original message content |
| lineage | string[] | Parent summary IDs β enables DAG traversal |
| descendant_count | number | Total messages in this subtree |
| earliest_at | timestamp | When the oldest message in this node was created |
| latest_at | timestamp | When the newest message in this node was created |
| depth | number | DAG depth (0 = leaf message, higher = more compressed) |
LCM tools let you move between compressed summaries and expanded detail on demand.
The user asks something that references old conversation. The system needs to find relevant context without loading everything into the context window.
"What did we decide about the LCM compression strategy?"lcm_grep Across Compressed HistoryInstead of scanning all messages, it searches summaries. The DAG structure means we only traverse relevant branches.
lcm_grep("LCM compression strategy")lcm_expandFound sum_03b (LCM System) and sum_02b (Perf Tuning). The system expands these specific branches to recover the full message subtree.
The expanded messages are injected into the prompt. The model sees the full relevant conversation history β exactly what it needs, nothing more.
β 12 messages expanded from 2 summaries β model contextFull-text or regex search across compacted summaries and messages. Returns matching snippets with IDs.
lcm_grep("compression", scope="both")Traverses the DAG from summary IDs downward, recovering the full subtree of messages and child summaries.
lcm_expand(["sum_03b"], maxDepth=3)Returns metadata for any LCM item: token counts, depth, compression ratio, lineage, timestamps.
lcm_describe("sum_ROOT")Delegated search + expand in one call. Greps for matching summaries, then expands the top results to answer a specific question.
lcm_expand_query(query, prompt)LCM achieves dramatic compression while preserving every piece of information. Here's the math.
Explore a simulated DAG. Search for topics, see results highlight, and expand to see the full subtree.
LCM doesn't work alone. It's one half of a two-layer memory system combining semantic search with historical context.
Every conversation chunk and summary is embedded into a vector. This enables semantic similarity search β finding topics by meaning, not keywords.
Hierarchical summarization preserves the full conversation tree. When relevant summaries are found, they can be fully expanded to recover exact context.
Vector search finds the right summaries fast. LCM expansion restores the full detail. Together: semantic recall + perfect fidelity.
Raw messages, summary DAG nodes, vector embeddings, and metadata (timestamps, lineage, token counts). Old messages can be evicted from the vector store while remaining accessible via LCM.
vector similarity search across embedded summaries. This returns the most semantically relevant nodes. Then lcm_expand is called on those nodes to pull the full conversation history. The result: the model gets exactly the context it needs, with perfect accuracy, regardless of how old the conversation is.