Skip to content
GitHub stars

Architecture Overview

msgvault syncs your Gmail to a local SQLite database, the only step that touches the network. Everything else (search, analytics, the TUI, and the MCP server) runs entirely offline against SQLite, Parquet, and local attachment files.

msgvault architecture: Gmail API syncs to SQLite, then offline Parquet analytics, FTS5 search, TUI, and MCP Server

Package Structure

msgvault/
├── cmd/msgvault/ # CLI entrypoint
│ └── cmd/ # Cobra commands
├── internal/ # Core packages
│ ├── tui/ # Bubble Tea TUI
│ ├── query/ # DuckDB query engine over Parquet
│ ├── store/ # SQLite database access
│ ├── deletion/ # Deletion staging and manifest
│ ├── gmail/ # Gmail API client
│ ├── sync/ # Sync orchestration
│ ├── oauth/ # OAuth2 flows (browser + device)
│ └── mime/ # MIME parsing
├── go.mod
└── Makefile

Key Packages

PackageResponsibility
cmd/Cobra CLI commands, config loading
internal/storeSQLite database operations, schema management
internal/syncSync orchestration, MIME parsing, checkpoint management
internal/gmailGmail API client with token bucket rate limiting
internal/oauthOAuth2 browser and device authorization flows
internal/queryDuckDB engine over Parquet files, SQLite fallback
internal/tuiBubble Tea model, lipgloss-styled views
internal/deletionDeletion staging, manifest generation
internal/mimeMIME message parsing, charset detection

Design Decisions

  • Offline by design: The Gmail API is only contacted during explicit sync-full, sync, and deletion commands. Every other operation (search, analytics, TUI, MCP) runs entirely against local data with no network access required. This means no background OAuth sessions, no persistent API connections, and no possibility of an external tool or AI assistant reaching your live mailbox.
  • SQLite as system of record: All message data lives in SQLite. Parquet is a derived cache.
  • DuckDB + Parquet for analytics: The TUI runs an embedded DuckDB engine over Parquet metadata exports, delivering aggregate queries hundreds of times faster than SQLite JOINs. The entire analytics cache for hundreds of thousands of messages fits in a few megabytes, making drill-down and re-aggregation feel instant.
  • Content-addressed attachments: Deduplicated by SHA-256 hash, stored on disk.
  • Resumable sync: Checkpoints allow interrupted syncs to resume without re-downloading.
  • Token bucket rate limiting: Respects Gmail API quotas without manual throttling.