How to Build AI Agent Memory With Claude Code Plugins

claude code agent memory context injection agent sdk session memory

You build AI agent memory by hooking into Claude Code's plugin system to intercept conversation events, compressing them into structured summaries, and reinjecting that compressed context at the start of new sessions through CLAUDE.md files or system prompt injection. The core loop works like this: capture tool calls and outputs during a session, run a compression pass that extracts decisions, file paths, and intent, then persist that to a local store that gets loaded as context on the next invocation. This pattern works with both Claude Code's native hook system and the broader agent-sdk for custom agent builds.

Setting Up the Capture Hook

Claude Code supports lifecycle hooks that fire on events like PreToolUse, PostToolUse, and Notification. You register these in your project's .claude/settings.json. The hook receives a JSON payload on stdin containing the tool name, input, and output. That's everything you need to build a memory trace.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": ".*",
        "hooks": [
          {
            "type": "command",
            "command": "python3 .claude/hooks/capture_memory.py"
          }
        ]
      }
    ]
  }
}

Capturing and Persisting Tool Events

The capture script reads the hook payload from stdin, extracts the relevant fields, and appends a structured event to a session log file. This gives you a full trace of what the agent did, which files it touched, and what it decided.

import json
import sys
from pathlib import Path
from datetime import datetime

payload = json.loads(sys.stdin.read())
event = {
    "timestamp": datetime.utcnow().isoformat(),
    "tool": payload.get("tool_name", "unknown"),
    "input_summary": str(payload.get("tool_input", ""))[:500],
    "output_summary": str(payload.get("tool_output", ""))[:1000],
    "session_id": payload.get("session_id", "default")
}

log_path = Path(".claude/memory/session_log.jsonl")
log_path.parent.mkdir(parents=True, exist_ok=True)
with open(log_path, "a") as f:
    f.write(json.dumps(event) + "\n")

Compressing Session Logs Into Memory

Raw event logs bloat fast. A 30-minute coding session can produce over 50 tool calls, which exceeds useful context length. The compression step is where the real value lives: you distill the log into a structured summary of decisions made, files modified, patterns established, and open questions. You can run this compression with Claude itself through the SDK, or use a simpler heuristic pass for speed.

import json
from pathlib import Path
from collections import defaultdict

def compress_session(log_path: str) -> dict:
    events = [json.loads(l) for l in Path(log_path).read_text().splitlines()]
    files_touched = set()
    decisions = []
    tool_counts = defaultdict(int)

    for event in events:
        tool_counts[event["tool"]] += 1
        inp = event["input_summary"]
        # Extract file paths from Write and Edit tool inputs.
        if event["tool"] in ("Write", "Edit", "Read"):
            files_touched.add(inp.split("\n")[0][:200])
        if event["tool"] == "Write":
            decisions.append(f"Created/wrote {inp.split(chr(10))[0][:100]}")

    return {
        "files_touched": list(files_touched)[:50],
        "decisions": decisions[:20],
        "tool_usage": dict(tool_counts),
        "event_count": len(events)
    }

Deep Compression With Claude Via the Agent SDK

For richer memory, use Claude to summarize the session into natural language that captures intent and reasoning, not file lists alone. The claude-code SDK lets you invoke Claude programmatically with the session log as input and get back a compressed narrative. This is the approach that Andrej Karpathy's "skills" pattern uses: extract reusable knowledge from sessions and persist it as instructions for future runs.

import subprocess
import json

def compress_with_claude(session_log: str) -> str:
    prompt = (
        "Summarize this coding session log into a concise memory document. "
        "Include: key decisions, architectural patterns chosen, files created "
        "or modified, unresolved issues, and reusable instructions for "
        "continuing this work. Be specific about file paths and function names.\n\n"
        f"{session_log[:80000]}"
    )
    # Use Claude Code in print mode for non-interactive summarization.
    result = subprocess.run(
        ["claude", "-p", prompt, "--output-format", "text"],
        capture_output=True, text=True, timeout=60
    )
    return result.stdout.strip()

Reinjecting Memory Into New Sessions

The injection point is the CLAUDE.md file at your project root. Claude Code reads this file at session start and treats its contents as persistent instructions. Your memory system writes compressed context into this file (or a file it imports) before each session. This is the most reliable injection method: no custom system prompts, no SDK wrappers, a file on disk.

from pathlib import Path
from datetime import datetime

def inject_memory(memory_text: str, project_root: str = "."):
    claude_md = Path(project_root) / "CLAUDE.md"
    marker_start = "<!-- AGENT_MEMORY_START -->"
    marker_end = "<!-- AGENT_MEMORY_END -->"
    timestamp = datetime.utcnow().isoformat()

    memory_block = f"{marker_start}\n## Session Memory (updated {timestamp})\n\n{memory_text}\n{marker_end}"

    if claude_md.exists():
        existing = claude_md.read_text()
        # Replace previous memory block if it exists.
        if marker_start in existing:
            import re
            existing = re.sub(f"{marker_start}.*?{marker_end}", "", existing, flags=re.DOTALL)
        claude_md.write_text(existing.strip() + "\n\n" + memory_block + "\n")
    else:
        claude_md.write_text(memory_block + "\n")

Wiring It All Together as a Session Lifecycle Script

This orchestrator runs after a Claude Code session ends. It reads the raw log, compresses it, and injects the result into CLAUDE.md so the next session starts with full context. You can trigger this manually or wire it into a post-session hook.

from pathlib import Path

def end_of_session_pipeline(project_root: str = "."):
    log_path = Path(project_root) / ".claude/memory/session_log.jsonl"
    if not log_path.exists() or log_path.stat().st_size == 0:
        return

    raw_log = log_path.read_text()
    # Use Claude for deep compression.
    memory = compress_with_claude(raw_log)
    inject_memory(memory, project_root)

    # Rotate the log so the next session starts fresh.
    archive = log_path.with_suffix(f".{int(log_path.stat().st_mtime)}.jsonl")
    log_path.rename(archive)
    print(f"Memory injected. Log archived to {archive}")

if __name__ == "__main__":
    end_of_session_pipeline()

Gotchas and Production Pitfalls

CLAUDE.md size limits matter. Claude Code reads the entire file into context. If your memory section grows past roughly 4,000 tokens, you're burning context window on history instead of the current task. Set a hard cap on the memory block size and compress aggressively. A good rule: keep the memory section under 2,000 words.

Hook scripts must exit quickly. Claude Code waits for hook processes to complete before continuing. If your capture script makes network calls or runs expensive compression inline, every tool call feels sluggish. The capture hook should be a fast append-to-file operation. Run compression as a separate post-session step, not inside the hook.

Don't store secrets in memory. Your session logs contain file contents, environment variable references, and potentially API keys that appeared in tool outputs. Scrub sensitive patterns before persisting. A regex pass for common key formats (sk-, ghp_, AKIA) catches most leaks.

Heuristic versus LLM compression is a real tradeoff. The heuristic approach (counting files, listing decisions) is free and instant but loses nuance. LLM compression captures reasoning and intent but costs a small API call and adds about 10 seconds of latency at session end. For solo development, heuristic compression works fine. For team setups where multiple people resume each other's sessions, LLM compression pays for itself immediately because it captures why decisions were made, not what happened.

The "skills" pattern. Karpathy's approach takes this further: instead of injecting raw session memory, you extract reusable skills (parameterized instructions like "when modifying the auth module, always run the integration tests in tests/auth/ first") and persist those as permanent CLAUDE.md entries. Over time, the agent accumulates project-specific expertise that survives across sessions and across developers. You implement this by adding a second compression pass that filters for generalizable patterns versus session-specific state.

← Back to all articles