"Show HN draft — Terms of Service rater on a DGX Spark"

2026-06-03

Show HN draft

Title options (pick one)

Show HN: I rate every Terms of Service with a local LLM on a box in my house
Show HN: Rate any Terms of Service, privacy, or cookie policy 1–5 stars (local LLM)
Show HN: A Terms of Service rater that runs on one DGX Spark on my desk

(#1 is the strongest — it leads with the human + the "local, on my desk" angle HN likes.)

URL to submit

https://digitalsurfacelabs.com/u/joe/pages/tos-checker

Body text (first comment, posted by OP immediately)

Nobody reads the Terms of Service. Or the privacy policy, or the cookie banner. Not because they're hard — because reading them is never worth the minutes, so we click "Accept" and move on. That made sense when the only thing that could read them was a person. It stopped making sense the moment a model could read one in seconds for a fraction of a cent.

So I built a thing that pays that tax for me. Paste a policy URL (or install the extension and it does it as you browse) and it scores the document 1–5 stars on a fixed, published rubric: data collection, retention, third-party sharing, user rights, and dispute resolution for terms/privacy; consent quality, tracking scope, third-party trackers, retention, and banner dark-patterns for cookie policies. Every axis comes back with a verbatim evidence quote and a citation into the rubric, so you can audit the reasoning instead of trusting a black box. The rubric files are rendered live on the methodology pages — the page shows you the exact check it ran.

The part I think is interesting: it all runs on a single NVIDIA DGX Spark on my desk — not an API, not a data center. One GB10 box, ~160W at the wall while it's thinking. That has consequences I decided to be honest about rather than hide:

A policy the system has never seen takes ~50s to rate (an 80B model, one generation at a time). The page shows your place in the queue.
Ratings are cached and shared across everyone, so popular policies (Google, Apple, etc.) come back instantly — it's a DB lookup, no model.
Under load (e.g. this post), novel-URL scans line up behind each other. That's the honest tradeoff of running it on hardware I own instead of renting GPUs.

By my measurement it costs about 0.04 of a cent and ~2.2 watt-hours to read one full policy — less than leaving a phone charger plugged in for an hour. There's a companion post that pulls my own numbers (how many policies I hit last week, best/worst, the live Spark stats): https://digitalsurfacelabs.com/u/joe/pages/blog-i-agree

Stack: Node/TypeScript, a small priority queue that serializes the single Spark and falls back to a cloud model when a human is waiting and the box is busy, SQLite for the rating cache, and a Chrome extension that shows a star badge on policy pages. If you connect an account it keeps a private, clearable history of every policy you've agreed to (off by default).

Big caveats, stated up front: it's an LLM, so it misreads and misses things; it is not legal advice; and the rubric encodes one opinionated view of "good." It's a reading aid to tell you what to read more carefully, nothing more. Happy to talk about the rubric, the queue design, the Spark, or where it's wrong.

Pre-post checklist

[ ] Warm the cache broadly (sequential, to avoid the burst-timeout): SEED_BASE=https://digitalsurfacelabs.com npx tsx scripts/seed-tos-ratings.ts
[ ] Open the page in an incognito window — confirms it works with no login and the default URL is instant (cached).
[ ] Do one fresh scan of an obscure ToS — confirm the queue/position UI + ~50s land.
[ ] Skim the three methodology pages render the live corpus; "view raw markdown" works.
[ ] Confirm the disclaimer reads well (top + bottom).
[ ] Blog post (/blog-i-agree) shows real "week in fine print" numbers (browse a few policy pages with the extension connected first so it isn't empty).
[ ] Have the extension .zip / Web Store listing status ready to answer "can I install it?" (Web Store review takes 1–3 days — submit before launch day).

Posting notes

Post Tue–Thu, ~8–10am ET, as a Show HN. Be in the thread for the first 2–3 hours.
Lead replies with substance (the queue/latency design, the rubric, the Spark) — HN rewards the honest engineering story over the pitch.
Expect pushback on: "an LLM rating legal docs is irresponsible." The disclaimer + "reading aid, not advice" + the auditable evidence quotes are the answer.
Expect "why a DGX Spark and not just an API?" — the honest answer (I have the box, it's idle, the data stays home, and it's a fun constraint) plays well.