Research

Digital Surface Labs

The Privacy-Availability Tradeoff in Memex

How to keep user data sovereign while maintaining uptime across networks

The Core Tension

Memex captures everything on your screen. That data is the most sensitive thing a person has — more intimate than email, more comprehensive than browsing history. It contains passwords, private messages, financial information, medical records, everything.

The current architecture keeps this data local: screenshots on your machine, ChromaDB on your machine (or your Jetson on your LAN), MCP server on your machine. When your laptop sleeps, your Memex goes dark.

This creates a fundamental tension:

  • Privacy demands locality. The safest place for your data is your own hardware.
  • Availability demands redundancy. The only way to be always-on is to have copies somewhere else.
  • Growth demands simplicity. If setup takes more than one command, most people won't do it.

Every solution is a different point on this triangle. There is no free lunch — but some points are much better than others.

Where We Are Today

The current Memex hosting modes represent three points on the spectrum:

Mode Privacy Availability Setup Effort
Local (laptop only) Maximum — never leaves your machine Poor — dies when you close your laptop Trivial
Jetson (LAN server) High — stays on your network Good — Jetson runs 24/7, but only if you own one Significant — buy hardware, configure networking
Remote self-host (VPS) Medium — data transits the internet, lives on a server you rent High — VPS uptime is 99.9%+ Moderate — SSH, tunnel config

For Joe's personal setup, the Jetson + Cloudflare Tunnel is excellent: always-on, LAN-local data, exposed via tunnel when needed. But this isn't something you can ask a random developer to replicate. They'd need to buy a Jetson, configure it, set up Cloudflare, and maintain it.

The question is: what's the one-command version of the Jetson setup?

Strategy 1: Encrypted Relay (The Realistic Answer)

The most practical approach for growth. Your data leaves your machine, but it's encrypted before it does.

How It Works

Your laptop                          Relay server (cloud)
┌─────────────┐                      ┌─────────────────┐
│ Memex       │    encrypted blob    │ Encrypted store  │
│ ChromaDB    │ ──────────────────>  │ (R2, S3, etc.)  │
│ OCR text    │                      │                  │
│             │  encrypt with your   │ Cannot decrypt.  │
│ age keypair │  key before upload   │ Just stores blobs│
└─────────────┘                      └─────────────────┘
                                            │
                                     ┌──────┴──────┐
                                     │ MCP Proxy   │
                                     │ (Worker)    │
                                     │             │
                                     │ Receives    │
                                     │ query,      │
                                     │ forwards to │
                                     │ your laptop │
                                     │ when online │
                                     │             │
                                     │ OR serves   │
                                     │ cached      │
                                     │ encrypted   │
                                     │ results     │
                                     └─────────────┘

The Key Insight: Separate Search Index from Raw Data

You don't need to sync everything. The relay only needs enough to answer queries when you're offline:

  1. Raw screenshots — never leave your machine. Ever. These are too sensitive and too large.
  2. ChromaDB embeddings — can be synced encrypted. Embeddings are lossy — you can't reconstruct the original text from a vector. This is a feature, not a bug.
  3. OCR text snippets — the sensitive part. Encrypt with age before syncing. The relay stores ciphertext.
  4. Metadata — timestamps, window titles, application names. Lower sensitivity, higher utility for routing queries.

The One-Command Setup

memex sync enable
# → Generates age keypair (stored at ~/.memex/key.txt)
# → Creates encrypted R2 bucket via Cloudflare API
# → Starts background sync daemon
# → Prints: "Your Memex is now available at memex.digitalsurfacelabs.com/@yourhandle"
# → When your laptop sleeps, cached embeddings + encrypted snippets still serve queries
# → When your laptop wakes, fresh data syncs automatically

What happens under the hood:

  1. age-keygen creates a keypair. Private key stays on your machine in ~/.memex/key.txt.
  2. A Cloudflare Worker is provisioned (via Wrangler API or a shared deployment).
  3. ChromaDB embeddings are serialized, encrypted with your public key, and uploaded to R2.
  4. OCR text is chunked, encrypted per-chunk, and uploaded to R2.
  5. A sync daemon watches for new captures and uploads incrementally.
  6. The Worker serves as an MCP proxy: when your laptop is online, it forwards queries to it. When offline, it does vector search over the encrypted embeddings (which it decrypts in-memory using a session key you grant on setup).

Wait — If the Worker Can Decrypt, Is It Zero-Knowledge?

No. And this is the honest tradeoff.

True zero-knowledge means the server can never see your data, even in memory. This is what Proton Mail and Tresorit do. But it means the server can't search your data either — the client must do all the work, which means the client must be online.

The compromise: The Cloudflare Worker holds a session key that can decrypt embeddings (not raw text). This key is rotated every time you come online. The Worker can do vector similarity search over embeddings, but:

  • It never sees raw OCR text (only embeddings, which are lossy)
  • It never sees screenshots
  • The session key is ephemeral — if you revoke it, the Worker can't decrypt anything
  • Audit log records every query the Worker serves while you're offline

This is weaker than true zero-knowledge but dramatically better than "your data is on someone else's server in plaintext."

Alternatives Within This Strategy

Approach Privacy Level Complexity Availability
Encrypted embeddings + Worker search Medium-high (embeddings only, lossy) Low (one command) High (always answers)
Full zero-knowledge, client-only search Maximum Medium Poor (only when online)
Encrypted sync + Nitro Enclave Very high (hardware attestation) High (AWS infra) High
Encrypted sync + your own relay High (you control the server) Medium (need a VPS) Depends on VPS

Strategy 2: Personal Edge Device as a Product (The Jetson Model, Productized)

Turn what you're already doing into a product. Ship a pre-configured device.

How It Works

What you do today (manually):                What the product does (one box):
──────────────────────────────────           ──────────────────────────────────
1. Buy Jetson Orin Nano                       1. Buy "Memex Box" ($149)
2. Flash JetPack                              2. Plug in, connect to WiFi
3. Install ChromaDB                           3. Open browser, claim your handle
4. Install Memex server                       4. Install Memex on laptop
5. Configure systemd services                 5. Done
6. Set up Cloudflare tunnel
7. Configure DNS

The "Memex Box" is a Raspberry Pi 5 (or Jetson Orin Nano for power users) that ships with:

  • Memex server pre-installed
  • ChromaDB pre-configured
  • Cloudflare Tunnel pre-provisioned (user claims their subdomain on first boot)
  • Auto-updates via apt or container pulls

Why This Is Interesting

  • Data literally stays in your house. Not encrypted-on-someone-else's-server. In your house.
  • Always-on. The box runs 24/7 on ~15W.
  • No cloud dependency. If Cloudflare goes down, the box still works on your LAN.
  • Tangible. People understand "my data is on that box" in a way they don't understand "my data is encrypted in R2."

Why This Is Hard

  • Hardware is hard. Shipping physical products, managing inventory, handling returns.
  • Support burden. "My box won't connect to WiFi" is a different support problem than "my Docker container won't start."
  • Scale. You can't ship 10,000 boxes overnight.

The Middle Ground: A Pre-Built Docker Image

Don't ship hardware. Ship a Docker image that runs on any always-on machine the user already has.

# User has an old laptop, a NAS, a Raspberry Pi, anything always-on
docker run -d --name memex-relay \
  -p 8082:8082 \
  -v memex-data:/data \
  ghcr.io/joenewbry/memex-relay:latest

# First boot: auto-provisions Cloudflare tunnel, prints URL
# Laptop Memex points at this relay
memex config set hosting relay --url http://192.168.1.50:8082

This gets 80% of the Jetson benefits with 20% of the setup friction.

Strategy 3: Peer-to-Peer with Encrypted Relay Fallback

Combine Syncthing-style P2P sync with an encrypted cloud fallback.

How It Works

Your laptop ◄──── Syncthing (P2P, encrypted) ────► Your always-on device
     │                                                      │
     │ (when both online: P2P sync, zero cloud)             │
     │                                                      │
     └── when offline ──► Encrypted relay (R2) ◄── when offline ──┘
                              │
                         MCP Proxy (Worker)
                         serves queries when
                         both devices are offline

Three tiers of availability:

  1. Laptop online → queries go directly to laptop. Maximum privacy.
  2. Laptop offline, relay device online → queries go to your always-on device on your LAN. High privacy.
  3. Both offline → queries go to encrypted cache in R2. Medium-high privacy.

One-Command Setup

memex sync enable --mode p2p
# → Installs Syncthing integration
# → If it detects another Memex device on LAN, pairs automatically
# → If no LAN device, falls back to encrypted cloud relay
# → Prints availability tier: "Tier 1: laptop, Tier 2: [none detected], Tier 3: cloud relay"

If the user later sets up a Raspberry Pi:

# On the Pi
memex relay install
# → Joins the Syncthing cluster automatically (mDNS discovery)
# → User's availability upgrades from Tier 3 to Tier 2

Strategy 4: Team Deployment Behind a Firewall

For companies. This is the enterprise version.

The Setup

The company runs one Memex relay server inside their firewall. Every employee's Memex syncs to it.

Corporate Network
┌──────────────────────────────────────────────┐
│                                              │
│  Employee A ──► ┌──────────────────┐         │
│  Employee B ──► │ Memex Relay      │         │
│  Employee C ──► │ (company server) │         │
│                 │                  │         │
│                 │ Each employee's  │         │
│                 │ data in separate │         │
│                 │ encrypted silo   │         │
│                 └────────┬─────────┘         │
│                          │                   │
│              Cloudflare Tunnel (optional)     │
│              for remote employees             │
└──────────────────────────┼───────────────────┘
                           │
                    ┌──────┴──────┐
                    │ Remote      │
                    │ employees   │
                    │ via tunnel  │
                    └─────────────┘

Key Properties

  • Data stays inside the company network. No cloud storage. IT is happy.
  • Per-employee encryption. Even the relay admin can't read Employee A's data without A's key. Employees are happy.
  • Cloudflare Tunnel for remote access. Remote employees connect through the tunnel with Zero Trust authentication. Security team is happy.
  • Org Registry runs on the relay. Team queries, topology, standup automation — all inside the firewall.

One-Command Setup (for IT)

# On company server
memex enterprise install --domain memex.company.internal
# → Provisions relay server
# → Creates org registry
# → Generates invite codes for employees
# → Optionally sets up Cloudflare Tunnel for remote access
# For each employee
memex join <invite-code>
# → Connects to company relay
# → Generates personal keypair
# → Starts syncing (encrypted)
# → Employee's Memex is now available 24/7 via the company relay

The Growth Path: Which Strategy When

Phase 1 (now, 1-50 users):
  Strategy 1 — Encrypted relay via Cloudflare
  One command. No hardware. Works immediately.
  Privacy: medium-high (encrypted embeddings in R2)

Phase 2 (50-500 users, power users emerge):
  Strategy 2 — Docker relay image for always-on devices
  Users who care about privacy run their own relay.
  Strategy 3 — P2P sync for multi-device users
  Privacy: high (data on your hardware)

Phase 3 (500+ users, companies adopt):
  Strategy 4 — Enterprise deployment behind firewall
  IT installs one relay, employees join with invite codes.
  Privacy: maximum (data never leaves company network)

All phases coexist. A user on Phase 1 can upgrade to Phase 2
by running a Docker container. A company can start at Phase 4.

The One URL Problem

You mentioned wanting "one URL to go to, one place that people can look for this."

This is the registry: memex.digitalsurfacelabs.com

  • memex.digitalsurfacelabs.com/@joe → Joe's Memex (routes to wherever Joe's data lives — laptop, relay, cloud)
  • memex.digitalsurfacelabs.com/discover → search for people by skills
  • memex.digitalsurfacelabs.com/orgs/alaska-eng → Alaska Engineering team

The registry is a thin routing layer. It knows where everyone's MCP endpoint is. It doesn't store any Memex data. When you query @joe, the registry checks:

  1. Is Joe's laptop online? → Route there (fastest, most private)
  2. Is Joe's relay online? → Route there (still private, slightly slower)
  3. Is Joe's cloud cache available? → Route there (encrypted, always-on)

The user never needs to know which tier is serving the response.

What Actually Exists Today That Solves Pieces of This

Tool What It Solves What It Doesn't
Cloudflare Tunnel Expose local services without port forwarding Doesn't help when laptop is off
Syncthing P2P encrypted sync between devices Needs at least one device online
age Client-side encryption, simple key management Just encryption, no sync/search
Cloudflare R2 Cheap encrypted-at-rest object storage Not zero-knowledge (CF holds keys)
Cloudflare Workers Edge compute, can run MCP proxy Not confidential computing
restic Encrypted incremental backup to any backend Backup, not live sync
Tailscale Mesh VPN, makes all your devices reachable Doesn't help when all devices are off
1Password key model Per-user encryption with org recovery Designed for secrets, not large datasets

None of these solve the whole problem alone. The Memex solution will compose them.

Recommendation

Start with Strategy 1 (Encrypted Relay) for growth, but design the architecture so Strategy 2/3/4 are just configuration changes, not rewrites.

The key architectural decision: the MCP proxy is the universal front door. Whether it forwards to your laptop, your relay, or your encrypted cloud cache is an implementation detail behind a stable URL.

memex.digitalsurfacelabs.com/@handle
         │
         ▼
   ┌─────────────┐
   │  MCP Proxy   │ ← This is the product. Everything else is plumbing.
   │  (Worker)    │
   └──────┬──────┘
          │
    ┌─────┼──────────────┐
    │     │              │
    ▼     ▼              ▼
 Laptop  Relay       Cloud cache
 (best)  (good)      (acceptable)

Build the proxy. Make it route intelligently. Then let users choose their own privacy/availability tradeoff by picking where their data lives. The URL stays the same regardless.

Open Questions

  1. Embedding search over encrypted data — Can we do useful vector search without decrypting? Homomorphic encryption is too slow. Encrypted search indexes (like what CipherStash does) might work for exact match but not vector similarity. This might be the hardest technical problem.

  2. Key recovery — If a user loses their age key, their cloud-synced data is gone forever. Do we offer recovery (at the cost of some privacy) or let it be truly zero-knowledge (at the cost of potential data loss)?

  3. Pricing model for the relay — The encrypted relay has real costs (R2 storage, Worker invocations). Who pays? The user (subscription)? The querier (per-query fee)? Free tier with limits?

  4. Legal exposure — If law enforcement asks for a user's data and we hold encrypted blobs, what's our obligation? True zero-knowledge means we literally can't comply. Is that the right position?

  5. Cloudflare dependency — Heavy reliance on Cloudflare (Tunnels, Workers, R2). What's the exit strategy if CF changes terms or pricing?