how modules work

A module installs by creating tables with convention prefixes. The system discovers what exists. No registration, no base class, no interface.

_raw_*      immutable     content, embeddings      written by compile
_edges_*    append-only   relationships             written by compile or modules
_types_*    immutable     classification            written by compile
_enrich_*   mutable       graph scores              written by manage

The prefix IS the lifecycle declaration. An AI seeing _enrich_source_graph knows: mutable, safe to delete, will be recomputed. An AI seeing _raw_chunks knows: immutable, never wipe.

Views rebuild automatically when tables are added. Presets ship with the module. A cell without a module has full retrieval — those edge columns are simply absent.

Two kinds of modules:

TypeWhat it doesExample
SourceCompiles raw artifacts into chunksClaude Code, Cursor, Codex
ExtensionAttaches intelligence to existing chunksSOMA (file/repo/content/URL identity)

source modules

A source module compiles raw artifacts into chunks. One adapter per format. Different tools, same output shape.

module structure

compile/    parse format → chunk-atom tables
manage/     offline enrichment → _enrich_* columns
stock/      presets and views shipped with the module

the contract

Write two tables. Embed. Views rebuild automatically.

# minimal adapter
_raw_chunks   (id, content, embedding, timestamp)
_edges_source (chunk_id, source_id)

Everything else is additive. Add _types_message for classification. Add _edges_tool_ops if your format has file operations. Call soma_enrich(chunk) inline and four identity edge tables appear automatically.

The compile contract is intentionally minimal. Two tables gets you retrieval, graph intelligence, presets, and MCP access. Every other table is optional enrichment.

claude code

The reference source module. Indexes your Claude Code session history on first run (Claude retains ~30 days of JSONLs), then captures everything going forward via hooks and a background worker.

what gets captured

LayerWhatTable
MessagesEvery prompt, response, tool call — full fidelity, no truncation_raw_chunks
Tool operationsEvery Read, Write, Edit, Bash — tool name, target file, cwd_edges_tool_ops
DelegationsParent → child agent tree with agent type — the most-used advanced feature_edges_delegations
ClassificationMessage type (user_prompt, assistant, tool_call), role, threading_types_message
File identitySOMA UUIDs for every file touch — survives renames_edges_file_identity
Repo identityGit root commit hash — survives repo moves_edges_repo_identity
Content identitySHA-256 of file content at capture time_edges_content_identity
URL identityStable UUID for WebFetch operations_edges_url_identity

live capture

Claude Code tool use
       ↓
  [hooks]  write session_id to ~/.flex/queue.db
       ↓
  [worker]  polls every 2s, reads JSONL, embeds, writes to cell
       ↓
  [MCP]  exposes cell as read-only SQL

Hooks are notification-only — they write a session ID and timestamp. The worker reads the actual JSONL for data. Crash-safe: idempotent inserts, startup backfill recovers missed sessions.

enrichment

The worker runs a full enrichment cycle every 30 minutes:

LayerWhat it produces
Source graphCentrality, hub status, community membership, community labels
File graphFile co-edit relationships across sessions
Delegation graphParent → child agent topology with betweenness
FingerprintsNavigational index per session — key decisions, tool patterns
Project attributionMaps sessions to repos via 5-tier resolution

All enrichment is in _enrich_* tables. Safe to wipe. Recomputed automatically.

query surface

Two curated views — messages (17 columns, chunk-level) and sessions (15 columns, session-level) — compose all raw tables into a flat surface. The AI queries views, never raw tables. See Views for the full breakdown.

Multi-cell queries. If you also index documentation corpora, Claude can query both cells in the same conversation — session history from claude_code alongside design docs from a documentation cell. It runs them in parallel automatically.

presets

@orient          schema, views, presets, graph topology
@health          pipeline freshness, queue depth, embedding coverage
@digest          multi-day activity summary
@sprints         work periods detected by 6h gaps
@story           session narrative — timeline, artifacts, agents
@file            every session that touched a file, across renames
@genealogy       concept lineage — hubs, key excerpts
@delegation-tree recursive sub-agent tree
@bridges         cross-community connector sessions

extension modules

An extension module attaches intelligence to existing chunks. It doesn't parse — it enriches. Install by creating tables with convention prefixes. The view generator discovers them. The AI can JOIN on them immediately.

-- create a table, it appears in views
CREATE TABLE _edges_my_module (
  chunk_id  TEXT,
  my_field  TEXT NOT NULL
);

-- drop it, it disappears
DROP TABLE _edges_my_module;

No registration. No coupling. A cell without the module has full retrieval — those columns are simply absent.

SOMA

Architecturally, SOMA is an extension module — it installs by creating tables, uninstalls by dropping them. But for agentic coding, it's foundational. Git is technically optional for writing code. In practice, you'd never ship without it. SOMA is the same — that's why it ships with the Claude Code module.

SOMA provides stable identity for files, repos, content, and URLs. It's a standalone system (~/.soma/) with its own databases, shared across all cells and projects on the machine. When the sessions view shows project = 'myapp', that's SOMA — a 5-tier resolution stack traced the session back through repo identity. When @file tracks a file across renames, that's SOMA — a UUID assigned once and persisted in xattr.

why it exists

Paths are fragile. A session from six months ago worked in a worktree deleted the next day. The cwd is dead. The git root is dead. But the file exists in main, and the repo is still on disk. Without content-addressed identity, that session is an orphan. With SOMA, you trace every session that touched the file — across renames, repo moves, and deleted worktrees.

four identity layers

LayerTableWhat it tracksSurvives
File_edges_file_identityStable UUID per fileRenames, moves, repo migrations
Repo_edges_repo_identityGit root commit hashRepo moves, worktree deletion
Content_edges_content_identitySHA-256 of file content + git blob hashPath changes, branch switches
URL_edges_url_identityStable UUID per URLNormalization differences

All four are written at compile time. Identity is resolved once, written into edge tables, and persists forever.

how it works

JSONL sync (worker reads session file)
    ↓
    soma.compile.enrich(chunk)
        ↓
        FileIdentity.get_or_create(path)  → file_uuid
        git rev-parse --show-toplevel     → repo_root
        git hash-object {file}            → blob_hash
        sha256(file_content)              → content_hash
        URLIdentity.get_or_create(url)    → url_uuid
        ↓
    insert_edges(conn, chunk)
        → _edges_file_identity
        → _edges_repo_identity
        → _edges_content_identity
        → _edges_url_identity

what it enables

Most of SOMA's value is ambient — you benefit without knowing it's there:

You seeSOMA does
project = 'myapp' on sessions5-tier repo attribution from git root commit hash
@file path=auth.py tracks renamesStable UUID per file, fan-out across all historical paths
file_uuids in messages view1:N identity collapsed to JSON array per chunk

The @file preset is the primary interface — it resolves identity, fans out across renames, and returns a unified history:

# every session that touched a file, across all renames
$ flex search "@file path=src/auth.py"

# or just ask Claude
"Use flex: what's the history of auth.py?"

self-healing

SOMA runs a 4-pass heal cycle every 24 hours: file UUIDs, content hashes, URL UUIDs, and pre-edit blob hashes from Claude Code's ~/.claude/file-history/ backup files. The forward path captures identity at sync time. The heal pass backfills gaps from capture failures.

SOMA is architecturally optional — the worker imports it with try/except, and a cell without it has full retrieval and graph. But the Claude Code module ships with SOMA because agentic coding without stable file identity is like coding without git — technically possible, practically unthinkable.

views

Modules define what goes into a cell. Views define what comes out.

raw vs views

Raw tables are what gets compiled — immutable facts written once and never modified. A cell has normalized relationships, identity edges, and enrichment scores spread across many tables.

Views compose those tables into a flat surface the AI queries directly. When the @orient preset is invoked it exposes the view-level schema — this presents a curated view of the data. It sees two views:

ViewLevelWhat it shows
messagesChunkEvery message, tool call, and file operation — with session context, file identity, delegation edges, and full file content pre-joined into flat columns
sessionsSessionEvery session with project, graph intelligence (centrality, hubs, communities), fingerprints, and warmup noise already filtered out

The views handle the joins, the 1:N collapse, and the noise filtering. The AI writes WHERE is_hub = 1 like it's a column that always existed.

why it matters

Without views:

-- 5 tables, 4 JOINs, noise filter, GROUP BY
SELECT src.source_id, src.project, g.centrality
FROM _raw_sources src
LEFT JOIN _types_source_warmup w ON src.source_id = w.source_id
LEFT JOIN _enrich_source_graph g ON src.source_id = g.source_id
WHERE COALESCE(w.is_warmup_only, 0) = 0
  AND g.is_hub = 1

With views:

SELECT session_id, project, centrality
FROM sessions
WHERE is_hub = 1

user-editable

Views are plain .sql files at ~/.flex/views/claude_code/. Edit them to change what the AI sees. Your copy takes precedence over module defaults. Run flex sync to install.

Views are what make @orient work. The AI reads view columns to learn what to filter on. The cell describes itself through its views.

writing a module

source module

To index a new tool (Cursor, Codex, or anything that produces session artifacts):

flex/modules/your_tool/
└─ compile/
│   └─ worker.py          parse your format → _raw_chunks + _edges_source
│   └─ skip.py            noise filtering (optional)
└─ manage/
│   └─ noise.py           graph filter config (optional)
└─ stock/
    └─ presets/           .sql files shipped with your module
    └─ views/             curated view .sql files (optional)

The minimal implementation:

# 1. Parse your format into chunks
for chunk in parse_your_format(source_file):
    insert_chunk_atom(conn, chunk_id, content, timestamp)
    insert_edge(conn, chunk_id, source_id)

# 2. Embed
embedder = get_model()
for chunk in get_unembedded(conn):
    embedding = embedder.encode(chunk.content)
    update_embedding(conn, chunk.id, embedding)

# 3. Views rebuild automatically
regenerate_views(conn)

That's it. You now have retrieval, graph intelligence, presets, and MCP access. The rest is additive — classification, tool ops, identity edges, enrichment scripts.

extension module

To attach new intelligence to existing chunks:

# 1. Create tables with convention prefixes
conn.execute("""
    CREATE TABLE IF NOT EXISTS _edges_your_module (
        chunk_id TEXT,
        your_field TEXT NOT NULL
    )
""")

# 2. Populate from existing data
for chunk_id, value in compute_your_intelligence(conn):
    conn.execute(
        "INSERT INTO _edges_your_module VALUES (?, ?)",
        (chunk_id, value)
    )

# 3. Views rebuild automatically
regenerate_views(conn)
Prefix rules: _edges_* for relationships (1:N, no PK on chunk_id — excluded from auto-generated views, query via explicit JOIN). _enrich_* for mutable scores (1:1 with PK — auto-joins into views, safe to wipe). _types_* for immutable classification. The prefix declares the lifecycle. The PK declares view inclusion.