Architecture Overview
msgvault syncs your Gmail or IMAP accounts to a local SQLite database and can import local MBOX archives, Apple Mail exports, and text messages from WhatsApp, iMessage, and Google Voice. Keyword search, analytics, the TUI, and the MCP server run against SQLite, Parquet, and local attachment files. Optional vector search calls the embedding endpoint configured in [vector.embeddings] to build/query semantic vectors, then stores those vectors locally in vectors.db.
Package Structure
msgvault/├── cmd/msgvault/ # CLI entrypoint│ └── cmd/ # Cobra commands├── internal/ # Core packages│ ├── tui/ # Bubble Tea TUI│ ├── query/ # DuckDB query engine over Parquet│ ├── store/ # SQLite database access│ ├── deletion/ # Deletion staging and manifest│ ├── gmail/ # Gmail API client│ ├── sync/ # Sync orchestration│ ├── imap/ # IMAP client (go-imap/v2)│ ├── importer/ # MBOX and EMLX import logic│ ├── mbox/ # MBOX format parser│ ├── emlx/ # Apple Mail .emlx parser│ ├── applemail/ # Apple Mail account discovery│ ├── remote/ # Remote API engine for TUI│ ├── vector/ # Local vector index, embedding worker, hybrid search│ ├── whatsapp/ # WhatsApp backup import│ ├── imessage/ # iMessage import│ ├── gvoice/ # Google Voice Takeout import│ ├── microsoft/ # Microsoft 365 OAuth│ ├── oauth/ # OAuth2 flows (browser + device)│ └── mime/ # MIME parsing├── go.mod└── MakefileKey Packages
| Package | Responsibility |
|---|---|
cmd/ | Cobra CLI commands, config loading |
internal/store | SQLite database operations, schema management |
internal/sync | Sync orchestration, MIME parsing, checkpoint management |
internal/imap | IMAP client, connection management, credential storage |
internal/importer | MBOX and EMLX import orchestration, message ingestion |
internal/mbox | MBOX format reader (mboxo/mboxrd) |
internal/emlx | Apple Mail .emlx parser and mailbox discovery |
internal/applemail | Apple Mail account discovery via Accounts4.sqlite |
internal/remote | HTTP engine implementing query.Engine for remote TUI |
internal/vector | Vector index backend, embedding client/worker, and semantic/hybrid search |
internal/gmail | Gmail API client with token bucket rate limiting |
internal/oauth | OAuth2 browser and device authorization flows |
internal/query | DuckDB engine over Parquet files, SQLite fallback |
internal/tui | Bubble Tea model, lipgloss-styled views |
internal/deletion | Deletion staging, manifest generation |
internal/whatsapp | WhatsApp backup parsing and import |
internal/imessage | iMessage database import |
internal/gvoice | Google Voice Takeout parsing and import |
internal/microsoft | Microsoft 365 OAuth flow |
internal/mime | MIME message parsing, charset detection |
Design Decisions
- Local-first by design: The Gmail API and IMAP servers are only contacted during explicit
sync-full,sync, and deletion commands. Keyword search, analytics, TUI views, and ordinary MCP reads run against local data with no mailbox network access. Optional vector search additionally calls only the embedding endpoint you configure, so a local/self-hosted endpoint keeps semantic search on your own machine or network. - SQLite as system of record: All message data lives in SQLite. Parquet is a derived cache.
- DuckDB + Parquet for analytics: The TUI runs an embedded DuckDB engine over Parquet metadata exports, delivering aggregate queries hundreds of times faster than SQLite JOINs. The entire analytics cache for hundreds of thousands of messages fits in a few megabytes, making drill-down and re-aggregation feel instant.
- Content-addressed attachments: Deduplicated by SHA-256 hash, stored on disk.
- Resumable sync: Checkpoints allow interrupted syncs to resume without re-downloading.
- Token bucket rate limiting: Respects Gmail API quotas without manual throttling.