Flex — Modules

how modules work

A module installs by creating tables with convention prefixes. The system discovers what exists. No registration, no base class, no interface.

_raw_*      immutable     content, embeddings      written by compile
_edges_*    append-only   relationships             written by compile or modules
_types_*    immutable     classification            written by compile
_enrich_*   mutable       graph scores              written by manage

The prefix IS the lifecycle declaration. An AI seeing _enrich_source_graph knows: mutable, safe to delete, will be recomputed. An AI seeing _raw_chunks knows: immutable, never wipe.

Views rebuild automatically when tables are added. Presets ship with the module. A cell without a module has full retrieval — those edge columns are simply absent.

Two kinds of modules:

Type	What it does	Example
Source	Compiles raw artifacts into chunks	Claude Code, Cursor, Codex
Extension	Attaches intelligence to existing chunks	SOMA (file/repo/content/URL identity)

source modules

A source module compiles raw artifacts into chunks. One adapter per format. Different tools, same output shape.

module structure

compile/    parse format → chunk-atom tables
manage/     offline enrichment → _enrich_* columns
stock/      presets and views shipped with the module

the contract

Write two tables. Embed. Views rebuild automatically.

# minimal adapter
_raw_chunks   (id, content, embedding, timestamp)
_edges_source (chunk_id, source_id)

Everything else is additive. Add _types_message for classification. Add _edges_tool_ops if your format has file operations. Call soma_enrich(chunk) inline and four identity edge tables appear automatically.

The compile contract is intentionally minimal. Two tables gets you retrieval, graph intelligence, presets, and MCP access. Every other table is optional enrichment.

claude code

The reference source module. Indexes your Claude Code session history on first run (Claude retains ~30 days of JSONLs), then captures everything going forward via hooks and a background worker.

what gets captured

Layer	What	Table
Messages	Every prompt, response, tool call — full fidelity, no truncation	`_raw_chunks`
Tool operations	Every Read, Write, Edit, Bash — tool name, target file, cwd	`_edges_tool_ops`
Delegations	Parent → child agent tree with agent type — the most-used advanced feature	`_edges_delegations`
Classification	Message type (user_prompt, assistant, tool_call), role, threading	`_types_message`
File identity	SOMA UUIDs for every file touch — survives renames	`_edges_file_identity`
Repo identity	Git root commit hash — survives repo moves	`_edges_repo_identity`
Content identity	SHA-256 of file content at capture time	`_edges_content_identity`
URL identity	Stable UUID for WebFetch operations	`_edges_url_identity`

live capture

Claude Code tool use
       ↓
  [hooks]  write session_id to ~/.flex/queue.db
       ↓
  [worker]  polls every 2s, reads JSONL, embeds, writes to cell
       ↓
  [MCP]  exposes cell as read-only SQL

Hooks are notification-only — they write a session ID and timestamp. The worker reads the actual JSONL for data. Crash-safe: idempotent inserts, startup backfill recovers missed sessions.

enrichment

The worker runs a full enrichment cycle every 30 minutes:

Layer	What it produces
Source graph	Centrality, hub status, community membership, community labels
File graph	File co-edit relationships across sessions
Delegation graph	Parent → child agent topology with betweenness
Fingerprints	Navigational index per session — key decisions, tool patterns
Project attribution	Maps sessions to repos via 5-tier resolution

All enrichment is in _enrich_* tables. Safe to wipe. Recomputed automatically.

query surface

Two curated views — messages (17 columns, chunk-level) and sessions (15 columns, session-level) — compose all raw tables into a flat surface. The AI queries views, never raw tables. See Views for the full breakdown.

Multi-cell queries. If you also index documentation corpora, Claude can query both cells in the same conversation — session history from claude_code alongside design docs from a documentation cell. It runs them in parallel automatically.

presets

@orient          schema, views, presets, graph topology
@health          pipeline freshness, queue depth, embedding coverage
@digest          multi-day activity summary
@sprints         work periods detected by 6h gaps
@story           session narrative — timeline, artifacts, agents
@file            every session that touched a file, across renames
@genealogy       concept lineage — hubs, key excerpts
@delegation-tree recursive sub-agent tree
@bridges         cross-community connector sessions

extension modules

An extension module attaches intelligence to existing chunks. It doesn't parse — it enriches. Install by creating tables with convention prefixes. The view generator discovers them. The AI can JOIN on them immediately.

-- create a table, it appears in views
CREATE TABLE _edges_my_module (
  chunk_id  TEXT,
  my_field  TEXT NOT NULL
);

-- drop it, it disappears
DROP TABLE _edges_my_module;

No registration. No coupling. A cell without the module has full retrieval — those columns are simply absent.

SOMA

Architecturally, SOMA is an extension module — it installs by creating tables, uninstalls by dropping them. But for agentic coding, it's foundational. Git is technically optional for writing code. In practice, you'd never ship without it. SOMA is the same — that's why it ships with the Claude Code module.

SOMA provides stable identity for files, repos, content, and URLs. It's a standalone system (~/.soma/) with its own databases, shared across all cells and projects on the machine. When the sessions view shows project = 'myapp', that's SOMA — a 5-tier resolution stack traced the session back through repo identity. When @file tracks a file across renames, that's SOMA — a UUID assigned once and persisted in xattr.

why it exists

Paths are fragile. A session from six months ago worked in a worktree deleted the next day. The cwd is dead. The git root is dead. But the file exists in main, and the repo is still on disk. Without content-addressed identity, that session is an orphan. With SOMA, you trace every session that touched the file — across renames, repo moves, and deleted worktrees.

four identity layers

Layer	Table	What it tracks	Survives
File	`_edges_file_identity`	Stable UUID per file	Renames, moves, repo migrations
Repo	`_edges_repo_identity`	Git root commit hash	Repo moves, worktree deletion
Content	`_edges_content_identity`	SHA-256 of file content + git blob hash	Path changes, branch switches
URL	`_edges_url_identity`	Stable UUID per URL	Normalization differences

All four are written at compile time. Identity is resolved once, written into edge tables, and persists forever.

how it works

JSONL sync (worker reads session file)
    ↓
    soma.compile.enrich(chunk)
        ↓
        FileIdentity.get_or_create(path)  → file_uuid
        git rev-parse --show-toplevel     → repo_root
        git hash-object {file}            → blob_hash
        sha256(file_content)              → content_hash
        URLIdentity.get_or_create(url)    → url_uuid
        ↓
    insert_edges(conn, chunk)
        → _edges_file_identity
        → _edges_repo_identity
        → _edges_content_identity
        → _edges_url_identity

what it enables

Most of SOMA's value is ambient — you benefit without knowing it's there:

You see	SOMA does
`project = 'myapp'` on sessions	5-tier repo attribution from git root commit hash
`@file path=auth.py` tracks renames	Stable UUID per file, fan-out across all historical paths
`file_uuids` in messages view	1:N identity collapsed to JSON array per chunk

The @file preset is the primary interface — it resolves identity, fans out across renames, and returns a unified history:

# every session that touched a file, across all renames
$ flex search "@file path=src/auth.py"

# or just ask Claude
"Use flex: what's the history of auth.py?"

self-healing

SOMA runs a 4-pass heal cycle every 24 hours: file UUIDs, content hashes, URL UUIDs, and pre-edit blob hashes from Claude Code's ~/.claude/file-history/ backup files. The forward path captures identity at sync time. The heal pass backfills gaps from capture failures.

SOMA is architecturally optional — the worker imports it with try/except, and a cell without it has full retrieval and graph. But the Claude Code module ships with SOMA because agentic coding without stable file identity is like coding without git — technically possible, practically unthinkable.

views

Modules define what goes into a cell. Views define what comes out.

raw vs views

Raw tables are what gets compiled — immutable facts written once and never modified. A cell has normalized relationships, identity edges, and enrichment scores spread across many tables.

Views compose those tables into a flat surface the AI queries directly. When the @orient preset is invoked it exposes the view-level schema — this presents a curated view of the data. It sees two views:

View	Level	What it shows
`messages`	Chunk	Every message, tool call, and file operation — with session context, file identity, delegation edges, and full file content pre-joined into flat columns
`sessions`	Session	Every session with project, graph intelligence (centrality, hubs, communities), fingerprints, and warmup noise already filtered out

The views handle the joins, the 1:N collapse, and the noise filtering. The AI writes WHERE is_hub = 1 like it's a column that always existed.

why it matters

Without views:

-- 5 tables, 4 JOINs, noise filter, GROUP BY
SELECT src.source_id, src.project, g.centrality
FROM _raw_sources src
LEFT JOIN _types_source_warmup w ON src.source_id = w.source_id
LEFT JOIN _enrich_source_graph g ON src.source_id = g.source_id
WHERE COALESCE(w.is_warmup_only, 0) = 0
  AND g.is_hub = 1

With views:

SELECT session_id, project, centrality
FROM sessions
WHERE is_hub = 1

user-editable

Views are plain .sql files at ~/.flex/views/claude_code/. Edit them to change what the AI sees. Your copy takes precedence over module defaults. Run flex sync to install.

Views are what make @orient work. The AI reads view columns to learn what to filter on. The cell describes itself through its views.

writing a module

source module

To index a new tool (Cursor, Codex, or anything that produces session artifacts):

flex/modules/your_tool/
└─ compile/
│   └─ worker.py          parse your format → _raw_chunks + _edges_source
│   └─ skip.py            noise filtering (optional)
└─ manage/
│   └─ noise.py           graph filter config (optional)
└─ stock/
    └─ presets/           .sql files shipped with your module
    └─ views/             curated view .sql files (optional)

The minimal implementation:

# 1. Parse your format into chunks
for chunk in parse_your_format(source_file):
    insert_chunk_atom(conn, chunk_id, content, timestamp)
    insert_edge(conn, chunk_id, source_id)

# 2. Embed
embedder = get_model()
for chunk in get_unembedded(conn):
    embedding = embedder.encode(chunk.content)
    update_embedding(conn, chunk.id, embedding)

# 3. Views rebuild automatically
regenerate_views(conn)

That's it. You now have retrieval, graph intelligence, presets, and MCP access. The rest is additive — classification, tool ops, identity edges, enrichment scripts.

extension module

To attach new intelligence to existing chunks:

# 1. Create tables with convention prefixes
conn.execute("""
    CREATE TABLE IF NOT EXISTS _edges_your_module (
        chunk_id TEXT,
        your_field TEXT NOT NULL
    )
""")

# 2. Populate from existing data
for chunk_id, value in compute_your_intelligence(conn):
    conn.execute(
        "INSERT INTO _edges_your_module VALUES (?, ?)",
        (chunk_id, value)
    )

# 3. Views rebuild automatically
regenerate_views(conn)

Prefix rules: _edges_* for relationships (1:N, no PK on chunk_id — excluded from auto-generated views, query via explicit JOIN). _enrich_* for mutable scores (1:1 with PK — auto-joins into views, safe to wipe). _types_* for immutable classification. The prefix declares the lifecycle. The PK declares view inclusion.