LCM: Lossless Context Management

The Problem

Context Windows Have Hard Limits

LLMs can only process so many tokens at once. As conversations grow, older context vanishes — until LCM.

🪟

Fixed Context Windows

Even 128K context windows fill up. When they do, the model forgets — permanently.

4K–128K

📈

Infinite History Growth

Chats accumulate indefinitely. A year of daily use = millions of tokens of history.

∞

💀

Forgetting & Hallucination

Without context, models hallucinate or contradict past decisions and facts.

↑ Errors

⚖️

The Tradeoff

More history = smarter responses, but costs more tokens and risks overflow.

Tradeoff

    LCM solves this with lossy-free compression. Instead of discarding old messages, we compress them into summaries while preserving the full information graph. The model always has context — it just arrives in different forms depending on depth.
  

The Architecture

The Summary DAG

A Directed Acyclic Graph of compressed conversation history. Each node knows its lineage — where it came from and what it contains.

Message (leaf)

Summary depth 1

Summary depth 2

Summary depth 3+

Field	Type	Description
id	string	Unique identifier (e.g. `sum_01a`, `msg_003`)
content	string	Summary text or original message content
lineage	string[]	Parent summary IDs — enables DAG traversal
descendant_count	number	Total messages in this subtree
earliest_at	timestamp	When the oldest message in this node was created
latest_at	timestamp	When the newest message in this node was created
depth	number	DAG depth (0 = leaf message, higher = more compressed)

How It Works

Traversing the DAG

LCM tools let you move between compressed summaries and expanded detail on demand.

1

User Asks About a Past Topic

The user asks something that references old conversation. The system needs to find relevant context without loading everything into the context window.

"What did we decide about the LCM compression strategy?"

2

System Runs `lcm_grep` Across Compressed History

Instead of scanning all messages, it searches summaries. The DAG structure means we only traverse relevant branches.

lcm_grep("LCM compression strategy")

3

Matching Summaries Identified → `lcm_expand`

Found sum_03b (LCM System) and sum_02b (Perf Tuning). The system expands these specific branches to recover the full message subtree.

lcm_expand(["sum_03b", "sum_02b"], maxDepth=2)

4

Detailed Context Delivered to the Model

The expanded messages are injected into the prompt. The model sees the full relevant conversation history — exactly what it needs, nothing more.

→ 12 messages expanded from 2 summaries → model context

LCM API Reference

lcm_grep

Full-text or regex search across compacted summaries and messages. Returns matching snippets with IDs.

lcm_grep("compression", scope="both")

lcm_expand

Traverses the DAG from summary IDs downward, recovering the full subtree of messages and child summaries.

lcm_expand(["sum_03b"], maxDepth=3)

lcm_describe

Returns metadata for any LCM item: token counts, depth, compression ratio, lineage, timestamps.

lcm_describe("sum_ROOT")

lcm_expand_query

Delegated search + expand in one call. Greps for matching summaries, then expands the top results to answer a specific question.

lcm_expand_query(query, prompt)

Performance

Compression Ratios

LCM achieves dramatic compression while preserving every piece of information. Here's the math.

Messages

30,000 tokens

↓

Summaries

2,000 tokens

↓

Root

400 tokens

compression_ratio = original_tokens / summary_tokens = 30,000 / 2,000 = 15:1

15:1

Typical compression ratio

100%

Information preserved

~2KB

Per-summary overhead

O(log n)

Access complexity

Hands-On

Interactive Demo

Explore a simulated DAG. Search for topics, see results highlight, and expand to see the full subtree.

🕸️ Simulated DAG — Click to Select

📂 Expansion View — Click a result to expand

// lcm_expand output will appear here

Select a node from the DAG or run a search to see its expanded subtree.

Integration

LCM in the Exocortex

LCM doesn't work alone. It's one half of a two-layer memory system combining semantic search with historical context.

Layer 1

🔍 Vector Embeddings

Every conversation chunk and summary is embedded into a vector. This enables semantic similarity search — finding topics by meaning, not keywords.

Layer 2

🧠 LCM Summary DAG

Hierarchical summarization preserves the full conversation tree. When relevant summaries are found, they can be fully expanded to recover exact context.

Synergy

⚡ Vector Search + LCM Expansion

Vector search finds the right summaries fast. LCM expansion restores the full detail. Together: semantic recall + perfect fidelity.

Storage

💾 What Gets Stored

Raw messages, summary DAG nodes, vector embeddings, and metadata (timestamps, lineage, token counts). Old messages can be evicted from the vector store while remaining accessible via LCM.

💬

User Message

New conversation occurs

→

🔢

Embed

Vector embedding + LCM summary

→

🔍

Vector Search

Find semantically relevant summaries

→

🌳

LCM Expand

Traverse DAG, recover full subtree

→

🧠

Model Context

Full relevant history injected

    The Exocortex Memory Model:

    When the user asks about something, the system first runs a vector similarity search across embedded summaries. This returns the most semantically relevant nodes. Then lcm_expand is called on those nodes to pull the full conversation history. The result: the model gets exactly the context it needs, with perfect accuracy, regardless of how old the conversation is.

LCM