Skip to content
GitHub stars

Architecture Overview

msgvault syncs your Gmail or IMAP accounts to a local SQLite database and can import local MBOX archives, Apple Mail exports, and text messages from WhatsApp, iMessage, and Google Voice. Keyword search, analytics, the TUI, and the MCP server run against SQLite, Parquet, and local attachment files. Optional vector search calls the embedding endpoint configured in [vector.embeddings] to build/query semantic vectors, then stores those vectors locally in vectors.db.

msgvault architecture: Gmail API syncs to SQLite, then offline Parquet analytics, FTS5 search, TUI, and MCP Server

Package Structure

msgvault/
├── cmd/msgvault/ # CLI entrypoint
│ └── cmd/ # Cobra commands
├── internal/ # Core packages
│ ├── tui/ # Bubble Tea TUI
│ ├── query/ # DuckDB query engine over Parquet
│ ├── store/ # SQLite database access
│ ├── deletion/ # Deletion staging and manifest
│ ├── gmail/ # Gmail API client
│ ├── sync/ # Sync orchestration
│ ├── imap/ # IMAP client (go-imap/v2)
│ ├── importer/ # MBOX and EMLX import logic
│ ├── mbox/ # MBOX format parser
│ ├── emlx/ # Apple Mail .emlx parser
│ ├── applemail/ # Apple Mail account discovery
│ ├── remote/ # Remote API engine for TUI
│ ├── vector/ # Local vector index, embedding worker, hybrid search
│ ├── whatsapp/ # WhatsApp backup import
│ ├── imessage/ # iMessage import
│ ├── gvoice/ # Google Voice Takeout import
│ ├── microsoft/ # Microsoft 365 OAuth
│ ├── oauth/ # OAuth2 flows (browser + device)
│ └── mime/ # MIME parsing
├── go.mod
└── Makefile

Key Packages

PackageResponsibility
cmd/Cobra CLI commands, config loading
internal/storeSQLite database operations, schema management
internal/syncSync orchestration, MIME parsing, checkpoint management
internal/imapIMAP client, connection management, credential storage
internal/importerMBOX and EMLX import orchestration, message ingestion
internal/mboxMBOX format reader (mboxo/mboxrd)
internal/emlxApple Mail .emlx parser and mailbox discovery
internal/applemailApple Mail account discovery via Accounts4.sqlite
internal/remoteHTTP engine implementing query.Engine for remote TUI
internal/vectorVector index backend, embedding client/worker, and semantic/hybrid search
internal/gmailGmail API client with token bucket rate limiting
internal/oauthOAuth2 browser and device authorization flows
internal/queryDuckDB engine over Parquet files, SQLite fallback
internal/tuiBubble Tea model, lipgloss-styled views
internal/deletionDeletion staging, manifest generation
internal/whatsappWhatsApp backup parsing and import
internal/imessageiMessage database import
internal/gvoiceGoogle Voice Takeout parsing and import
internal/microsoftMicrosoft 365 OAuth flow
internal/mimeMIME message parsing, charset detection

Design Decisions

  • Local-first by design: The Gmail API and IMAP servers are only contacted during explicit sync-full, sync, and deletion commands. Keyword search, analytics, TUI views, and ordinary MCP reads run against local data with no mailbox network access. Optional vector search additionally calls only the embedding endpoint you configure, so a local/self-hosted endpoint keeps semantic search on your own machine or network.
  • SQLite as system of record: All message data lives in SQLite. Parquet is a derived cache.
  • DuckDB + Parquet for analytics: The TUI runs an embedded DuckDB engine over Parquet metadata exports, delivering aggregate queries hundreds of times faster than SQLite JOINs. The entire analytics cache for hundreds of thousands of messages fits in a few megabytes, making drill-down and re-aggregation feel instant.
  • Content-addressed attachments: Deduplicated by SHA-256 hash, stored on disk.
  • Resumable sync: Checkpoints allow interrupted syncs to resume without re-downloading.
  • Token bucket rate limiting: Respects Gmail API quotas without manual throttling.