Skip to content
GitHub stars

Web Server

Overview

msgvault serve starts an HTTP server that exposes your local email archive over a REST API. It optionally runs a background sync scheduler to keep accounts up to date on a cron-based schedule.

The API queries the same SQLite database and attachment store as the CLI and TUI. Keyword search and ordinary archive reads stay local. If vector search is enabled, semantic and hybrid search also call the embedding endpoint configured in [vector.embeddings]. The server is designed for local integrations, dashboards, and automation scripts.

Quick Start

Add a [server] section to your config.toml:

[server]
api_port = 8080
api_key = "your-secret-key"

Start the server:

Terminal window
msgvault serve

Test connectivity:

Terminal window
# Health check (no auth required)
curl http://localhost:8080/health
# Archive stats (auth required)
curl -H "Authorization: Bearer your-secret-key" http://localhost:8080/api/v1/stats

Authentication

All endpoints except /health require authentication when api_key is set in your config. Three authentication methods are supported:

MethodHeaderExample
Bearer tokenAuthorization: Bearer <key>Authorization: Bearer my-secret
API key headerX-API-Key: <key>X-API-Key: my-secret
Plain auth headerAuthorization: <key>Authorization: my-secret

If no api_key is configured, authentication is not required regardless of bind address. The separate allow_insecure / security validation prevents starting without an API key on non-loopback addresses.

API Endpoints

GET /health

Health check endpoint. Does not require authentication.

Response:

{"status": "ok"}

GET /api/v1/stats

Archive statistics. When vector search is configured on the server, the response also includes a vector_search sub-object describing the state of the index.

Response (vector search disabled):

{
"total_messages": 142857,
"total_threads": 48293,
"total_accounts": 2,
"total_labels": 47,
"total_attachments": 31204,
"database_size_bytes": 8589934592
}

Response (vector search enabled):

{
"total_messages": 142857,
"total_threads": 48293,
"total_accounts": 2,
"total_labels": 47,
"total_attachments": 31204,
"database_size_bytes": 8589934592,
"vector_search": {
"enabled": true,
"active_generation": {
"id": 3,
"model": "nomic-embed-text-v1.5",
"dimension": 768,
"fingerprint": "nomic-embed-text-v1.5:768",
"state": "active",
"activated_at": "2026-04-18T15:12:33Z",
"message_count": 142820
},
"building_generation": {
"id": 4,
"model": "nomic-embed-text-v2",
"dimension": 768,
"started_at": "2026-04-19T09:02:10Z",
"progress": { "done": 8200, "total": 142857 }
},
"pending_embeddings_total": 134657
}
}

active_generation is always present in the object (null until the first build completes). building_generation is omitted when no rebuild is in flight. pending_embeddings_total is the sum of rows still pending across the active and building generations. See Vector Search for the end-to-end workflow.


GET /api/v1/messages

Paginated message list.

ParameterTypeDefaultDescription
pageint1Page number
page_sizeint20Results per page

Response:

{
"total": 142857,
"page": 1,
"page_size": 20,
"messages": [
{
"id": 12345,
"subject": "Q4 Planning",
"from": "alice@example.com",
"to": ["bob@example.com"],
"cc": ["carol@example.com"],
"sent_at": "2024-10-15T09:30:00Z",
"snippet": "Here's the draft for Q4...",
"labels": ["INBOX", "IMPORTANT"],
"has_attachments": true,
"size_bytes": 52480
}
]
}

GET /api/v1/messages/{id}

Full message details including body and attachment metadata.

Response:

{
"id": 12345,
"subject": "Q4 Planning",
"from": "alice@example.com",
"to": ["bob@example.com"],
"cc": ["carol@example.com"],
"bcc": ["dave@example.com"],
"sent_at": "2024-10-15T09:30:00Z",
"snippet": "Here's the draft for Q4...",
"labels": ["INBOX", "IMPORTANT"],
"has_attachments": true,
"size_bytes": 52480,
"body": "<full message body>",
"attachments": [
{
"filename": "q4-plan.pdf",
"mime_type": "application/pdf",
"size_bytes": 204800
}
]
}

The cc and bcc fields are included when the message has recipients of that type. They are omitted from the JSON response when empty.


GET /api/v1/search

Search messages. The default mode is full-text search (FTS5 with LIKE fallback). When the server is configured for vector search, mode=vector runs semantic-only search and mode=hybrid fuses BM25 and vector ranking via Reciprocal Rank Fusion.

mode=vector and mode=hybrid both require at least one free-text term in q — the free text is what gets embedded as the query vector. Operator-only queries such as q=from:alice have nothing to embed and return 400 missing_free_text; route filter-only requests to mode=fts instead.

ParameterTypeDefaultDescription
qstring(required)Search query
modeenumftsfts, vector, or hybrid
pageint1Page number (FTS only — vector/hybrid reject page>1)
page_sizeint20Results per page (max 100 for FTS, max [vector].search.max_page_size_hybrid for vector/hybrid)
explain0/10When 1 and `mode=vector

Response (mode=fts, default):

{
"query": "quarterly report",
"total": 23,
"page": 1,
"page_size": 20,
"messages": [
{
"id": 12345,
"subject": "Q4 Planning",
"from": "alice@example.com",
"to": ["bob@example.com"],
"cc": ["carol@example.com"],
"sent_at": "2024-10-15T09:30:00Z",
"snippet": "Here's the draft for Q4...",
"labels": ["INBOX", "IMPORTANT"],
"has_attachments": true,
"size_bytes": 52480
}
]
}

Response (mode=vector or mode=hybrid):

{
"query": "when is the planning offsite",
"mode": "hybrid",
"returned": 12,
"pool_saturated": false,
"generation": {
"id": 3,
"model": "nomic-embed-text-v1.5",
"dimension": 768,
"fingerprint": "nomic-embed-text-v1.5:768",
"state": "active"
},
"took_ms": 84,
"results": [
{
"id": 12345,
"subject": "Q2 planning offsite agenda",
"from": "alice@example.com",
"to": ["team@example.com"],
"sent_at": "2024-01-15T10:30:00Z",
"snippet": "Proposed agenda for the offsite on...",
"labels": ["INBOX"],
"has_attachments": false,
"size_bytes": 2048
}
]
}

Vector and hybrid responses expose returned instead of total (ANN search does not have a meaningful total count), add a generation sub-object naming the index generation that answered the query, and include took_ms. The top-level results array replaces messages. pool_saturated is true when a vector or BM25 candidate pool hit its configured cap (or pure vector search returned as many hits as requested), hinting that increasing the limit or narrowing the query may expose more relevant results.

When explain=1, each element of results carries an extra score object exposing the fused-score components:

{
"id": 12345,
"subject": "...",
"score": {
"rrf": 0.032,
"bm25": 7.4,
"vector": 0.82,
"subject_boosted": true
}
}

bm25 and vector are omitted when the message did not appear in that signal (BM25 missed it or the ANN pool did not include it). rrf is omitted in mode=vector (only one signal — there is nothing to fuse). subject_boosted is true when the subject-line boost was applied.

See Searching for the full query syntax reference and Vector Search for vector / hybrid setup.


GET /api/v1/accounts

List configured accounts with sync status.

Response:

{
"accounts": [
{
"email": "you@gmail.com",
"display_name": "Your Name",
"last_sync_at": "2024-10-15T08:00:00Z",
"next_sync_at": "2024-10-15T09:00:00Z",
"schedule": "0 * * * *",
"enabled": true
}
]
}

POST /api/v1/auth/token/{email}

Upload an OAuth token JSON file generated by a local msgvault client.

This endpoint is used by msgvault export-token during remote/headless deployment workflows.

Request headers:

  • X-API-Key: <api-key> (or any supported auth header)
  • Content-Type: application/json

Example request body (/api/v1/auth/token/you@gmail.com):

{
"access_token": "ya29...",
"token_type": "Bearer",
"refresh_token": "1//0g...",
"expiry": "2024-12-31T23:59:59Z",
"scopes": ["https://www.googleapis.com/auth/gmail.modify"]
}

Successful response (201 Created):

{
"status": "created",
"message": "Token saved for you@gmail.com"
}

POST /api/v1/accounts

Register or ensure an account is scheduled for sync on the remote server.

msgvault export-token posts to this endpoint automatically after uploading a token.

{
"email": "you@gmail.com",
"schedule": "0 2 * * *"
}

The enabled field is always set to true server-side.

If the account already exists (200 OK):

{
"status": "exists",
"message": "Account already configured for you@gmail.com"
}

On success (201 Created):

{
"status": "created",
"message": "Account added for you@gmail.com"
}

POST /api/v1/sync/{account}

Trigger a manual sync for an account. Returns immediately with a 202 status while the sync runs in the background.

Response (202 Accepted):

{
"status": "accepted",
"message": "Sync started for you@gmail.com"
}

GET /api/v1/scheduler/status

Scheduler state and per-account schedule details.

Response:

{
"running": true,
"accounts": [
{
"email": "you@gmail.com",
"running": false,
"last_run": "2024-10-15T08:00:00Z",
"next_run": "2024-10-15T09:00:00Z",
"schedule": "0 * * * *"
}
]
}

Rate Limiting

The API enforces rate limiting of 10 requests per second per client IP, with a burst allowance of 20 requests. When the limit is exceeded, the server responds with HTTP 429 and includes a Retry-After header indicating how long to wait before retrying.

CORS

Cross-Origin Resource Sharing is disabled by default. To allow browser-based clients, configure allowed origins in your config.toml:

[server]
cors_origins = ["http://localhost:3000", "https://my-dashboard.example.com"]
cors_credentials = true
cors_max_age = 3600

Scheduled Sync

The server can automatically sync accounts on a cron-based schedule. Add [[accounts]] sections to your config:

[[accounts]]
email = "you@gmail.com"
schedule = "0 * * * *" # every hour
enabled = true
[[accounts]]
email = "work@gmail.com"
schedule = "*/15 * * * *" # every 15 minutes
enabled = true

The scheduler starts automatically with msgvault serve when account schedules are configured. Use the /api/v1/scheduler/status endpoint to monitor schedule state, and /api/v1/sync/{account} to trigger a manual sync outside the schedule.

Security Model

The server is designed for local use:

  • Loopback-only by default. The default bind address is 127.0.0.1, restricting access to the local machine.
  • API key required for non-loopback. If you bind to a non-loopback address (e.g., 0.0.0.0), the server requires api_key to be set and will refuse to start without it.
  • Opt-in for insecure binding. To bind to a non-loopback address without an API key (not recommended), set allow_insecure = true.

Configuration Reference

All server settings go in the [server] section of config.toml. Account schedules use [[accounts]] sections.

[server]

KeyDefaultDescription
api_port8080Port the server listens on
bind_addr127.0.0.1Bind address
api_keyAPI key for authentication
allow_insecurefalseAllow non-loopback binding without api_key
cors_origins[]Allowed CORS origins
cors_credentialsfalseAllow credentials in CORS requests
cors_max_age0CORS preflight cache duration in seconds (defaults to 86400 when cors_origins is set)

[[accounts]]

KeyDefaultDescription
email(required)Gmail account email address
scheduleCron expression for sync schedule
enabledtrueWhether scheduled sync is active

See the Configuration page for the full config file reference.