"Neural Net Landing Page Optimization for OpenArcade"
Neural Net Landing Page Optimization for OpenArcade
The Problem
OpenArcade is a collection of browser arcade games where every play session generates training data for a vision model learning to play from raw pixels. The landing page shows game cards and live stats. The goal: maximize the percentage of visitors who click into a game and actually play.
The question is whether a neural network — potentially running client-side in the browser — can observe user behavior in real time and dynamically adapt the page to increase click-throughs. This article surveys the state of the art, evaluates what's practical at different traffic levels, and proposes a phased implementation plan.
Bottom line: Full deep RL for landing page optimization is overkill and sample-starved at any realistic OpenArcade traffic level. The right approach is a phased progression: analytics-driven card reordering first, then Thompson Sampling bandits, then contextual bandits as traffic grows. Client-side inference is technically feasible with sub-millisecond latency for small behavioral models, but the bottleneck is training data, not inference speed. Start with instrumentation.
1. What Actually Works: RL and Bandits for Landing Pages
The Academic Landscape
True deep RL (DQN, PPO, SAC) applied to live landing page optimization is virtually nonexistent in production. The reasons are fundamental:
- Sample complexity: Deep RL needs orders of magnitude more interactions than website traffic can provide
- Sparse, delayed rewards: Conversions happen minutes to days after page load — temporal credit assignment is brutal
- Non-stationary environment: User populations shift; policies overfit to historical distributions
- Enormous action space: All possible page configurations form a mostly-discrete, combinatorial action space
What does work falls into two categories:
Evolutionary Computation (Evolv AI). The most rigorous production system, based on Risto Miikkulainen's work at UT Austin (arXiv:1703.00556, AI Magazine 2020). A human defines the search space (which elements, which variants), and an evolutionary algorithm breeds candidate designs across generations, evaluated on real users. Multi-armed bandits handle traffic allocation. Reported results: 20-200% improvements over human design. One case study showed 43.5% lift after 60 days with ~600K interactions.
Contextual Bandits. The practical RL variant that dominates production. Technically a single-step RL problem: the state is visitor context (device, referrer, time of day), the action is which page variant to serve, the reward is binary conversion. This sidesteps temporal credit assignment entirely. Statsig, Optimizely, Kameleoon, and the now-merged VWO/AB Tasty all implement this.
Multi-Armed Bandits vs. A/B Testing
| Dimension | A/B Test | MAB (Thompson Sampling) |
|---|---|---|
| Traffic allocation | Fixed 50/50 split | Dynamic, shifts to winner |
| Statistical rigor | Frequentist, controlled error | Bayesian, probability-of-best |
| Sample size needed | Pre-calculated, fixed | Equal or greater for same guarantees |
| Opportunity cost | High (50% traffic on loser) | Lower (shifts away from losers) |
| Best for | Causal knowledge | Revenue optimization during test |
Critical insight: MABs don't need less data to reach statistical certainty — they reduce regret (lost conversions) during the learning phase by directing traffic away from losers faster. Thompson Sampling is the best algorithm for this: it concentrates exploration where uncertainty is highest, handles delayed feedback well, and has logarithmic cumulative regret vs. linear for epsilon-greedy.
For detecting a 10% relative lift from a 30% baseline at 95% confidence / 80% power, you need ~8,600 visitors per variant regardless of method. Thompson Sampling just wastes fewer of those visitors on the loser.
Existing Tools and Platforms
Google Optimize is dead (sunset September 2023). Here's what replaced it:
| Tool | Type | Under the Hood | Pricing | Best For |
|---|---|---|---|---|
| Statsig | MAB + contextual bandits | Thompson Sampling, automatic attribute encoding | Free: 2M events/mo. Pro: $0.05/1K above 5M | Best free tier |
| GrowthBook | A/B + MAB | Bayesian stats, self-hostable Docker | Open source (MIT). Bandits in Pro | Self-hosted |
| Evolv AI | Evolutionary + MAB | Population-based search with crossover/mutation | Enterprise (~$50-150K/yr) | Multivariate at scale |
| Webflow Optimize | ML multivariate | Continuous learning, per-segment optimization | Webflow add-on | Webflow users |
| VWO + AB Tasty | Full-stack experimentation | MAB + "Evi" AI agent for test setup | Enterprise ($10-50K/yr) | Full-stack |
| Vowpal Wabbit | Contextual bandits | IPS, direct method, doubly robust evaluation | Open source (Microsoft) | Build-your-own |
For OpenArcade's scale and self-hosted constraint (Jetson Orin Nano), the practical options are GrowthBook (Docker, MIT license) or a homegrown Thompson Sampling implementation (~50 lines of JS).
2. Browser-Side Neural Net Inference: Current State
Can You Run a Model in the Browser?
Yes, and for behavioral prediction models, it's fast enough to be imperceptible.
Framework benchmarks (from ACM TOSEM 2024 study across 50 PCs + 20 mobile devices):
| Model | Size | WebGL (ms) | WASM (ms) |
|---|---|---|---|
| MobileNetV2 | 14MB | 20 | 89 |
| ResNet-50 | 98MB | 80 | 172 |
| Small MLP (behavioral) | 50KB | <1 | <1 |
For behavioral prediction (15-30 input features → intent score), you need a 2-3 layer MLP with 32-64 neurons. Model size: 10-50KB. Inference: <1ms via WASM in a Web Worker. You don't even need TensorFlow.js or ONNX — a 3-layer MLP is ~50 lines of typed-array JS.
Backend Comparison
| Backend | Best For | Watch Out |
|---|---|---|
| WebGPU | Large transformers, matmul-heavy | Chrome/Edge only, ~85-90% coverage |
| WebGL | Medium models, broad compat | Shader warmup: up to 64x first-inference penalty. GPU contention degrades UI by up to 62.7% |
| WASM (SIMD+threads) | Small models, CPU-only, mobile | 4GB memory ceiling. Perfect for behavioral classifiers |
Recommendation for OpenArcade: WASM backend in a Web Worker. The model is tiny, inference is sub-millisecond, and there's zero risk of UI jank from GPU contention.
Model Size Guidelines
- Under 5MB: Loads instantly, negligible page-load impact. Ideal for behavioral classifiers.
- 5-50MB: Acceptable if lazy-loaded after interactive. MobileNet-class.
- 50-200MB: Must be cached in IndexedDB. Not suitable for embedded personalization.
3. What Signals to Capture
High-Predictive Signals (Research-Backed)
Based on the Smashing Magazine analysis by Eduard Kuric and SIGIR 2020 research on mouse movement representations:
Mouse/Cursor (highest signal-to-noise): - Cursor velocity (mean/max) — browsing style indicator; max velocity flags frustration - Hesitation time — average time from hover to click; measures decision difficulty - Cursor-to-CTA trajectory — directional movement toward call-to-action; predictive 3-5s before click - Direction changes — trajectory reversals indicate confusion or comparison - Path straightness ratio — direct distance / actual path; straighter = more decisive
Scroll: - Scroll depth — single most-tracked engagement metric - Scroll velocity — speed indicates skimming vs reading; pauses indicate interest - Reverse scrolls — scrolling back up = high engagement
Temporal: - Time-to-first-interaction — correlates with intent strength - Viewport dwell time per section (via IntersectionObserver)
Contextual (no tracking needed): - Device type, screen size, referrer source, time of day, connection speed, browser language
What's noise: Raw cursor position (layout-dependent), total click count without context, raw session duration (confounded by tab-away). Also: mouse DPI/OS acceleration curves differ across hardware — normalize velocity to percentiles, not absolute px/s.
Client-Side Architecture
MAIN THREAD WEB WORKER
=========== ==========
Signal Collector --transfer--> Inference Engine
(passive listeners, (50KB MLP via WASM)
rAF batching) |
| <--transfer-- Prediction
v {intent: 0.73}
DOM Adapter
(rAF-batched writes,
CSS order/opacity only)
Latency budget: Signal collection 0ms (passive) → feature extraction <3ms every 500ms → inference <1ms → DOM update <5ms (CSS-only). Total: ~500ms perception-to-adaptation, dominated by the collection window.
Cold-start: First 0-3s use contextual defaults (referrer × device × time of day). First behavioral signal available at 3s (scroll velocity, cursor trajectory). Confident adaptation at 10s+.
Key technique: Use CSS order on flex/grid containers to reorder sections without DOM manipulation. Use opacity, transform, visibility for visual changes — these don't trigger layout reflow.
Privacy
Client-side-only processing with no data exfiltration is lower risk than server-side tracking, but not automatically exempt from ePrivacy/GDPR. The ePrivacy Directive (Article 5(3)) covers "accessing information on terminal equipment" — which technically includes reading mouse positions via JavaScript.
Practical position: keep all processing in-memory (no localStorage/cookies for behavioral data), respect navigator.globalPrivacyControl, disclose in privacy policy under "automated decision-making."
4. What to Optimize: Highest Leverage Elements
Research on what actually moves the needle for landing page conversion:
Ranked by Expected Impact for OpenArcade
1. Card ordering (15-40% relative lift)
The serial position effect (CXL research) shows position 1 gets 10.5% click-through vs 7.3% for position 5 — a 44% relative difference just from position. The first 3-4 cards above the fold capture disproportionate attention. Simply reordering by actual engagement data from the recorder is the single highest-impact change.
2. Reducing visible cards (10-30% lift)
Landingi research found pages with fewer than 10 elements convert at 2x the rate of pages with 40+. Showing 6-8 top games with a "Show All" expansion could increase engagement with visible games.
3. Adding explicit CTA (10-28% lift)
Cards are clickable <a> tags but have no explicit "Play Now" button. VWO research shows CTA buttons outperform text-based CTAs by up to 28%. Unbounce documented a case where a three-word CTA change produced 104% conversion lift.
4. Mission bar position (5-15% lift)
The mission bar sits between the hero and the game grid. On smaller screens it may push popular games below the fold. LandingPageFlow research found moving primary CTAs above the fold produced 101% increase in clicks.
5. Hero copy and card descriptions (5-10% lift each)
NEW: Card Visual Treatment (30-200% relative lift)
This is the highest-potential optimization dimension, but it was missing from the original analysis because the current cards are text-only. The question: does adding a visual preview — a static screenshot or a short looping gameplay video — dramatically increase the rate at which visitors click through and actually play?
The research strongly suggests yes:
Static images vs. text-only: - Landing pages with relevant images convert 21% higher than text-only layouts (Zebracat 2025) - Artist photos replacing text descriptions produced a 95% lift in one VWO case study (VWO) - On Steam, the capsule image (the small card thumbnail) is the single biggest driver of click-through — most users decide from the image alone, never reading the description (Phiture/ASOStack)
Animated previews vs. static images: - Animated GIF recipients clicked through 203% more than those shown static images in a MarketingSherpa A/B test (MarketingSherpa) - Cinemagraphs (subtle looping animations) drove 110% higher engagement than still photos in Microsoft's Twitter ad experiments, with cost per engagement dropping 45% (MarketingProfs 2018) - Animated ads averaged 7% higher conversion rate overall, but 6-9 second animations achieved 138% higher conversions (AdEspresso) - itch.io supports animated GIF thumbnails on hover, and indie developers consistently report these are critical for standing out in browse feeds (itch.io docs)
The counterexample (important): - Apple's own A/B test on App Store product pages found that the page without a video preview outperformed the page with one (Apple Developer). The lesson: in fast-scroll browse contexts, a well-designed static thumbnail can outperform video because it's more scannable. This matters for OpenArcade because the grid shows 100 games — scroll speed is high.
The performance governor: - Every additional second of page load drops conversions by 4.42% (Portent) - Pages with images over 1MB average 9.8% conversion vs 11.4% without — a 14% relative decrease (involve.me) - 53% of mobile visitors leave if load exceeds 3 seconds (Instapage)
Format and file size guidance:
| Format | Size per card | Quality | Browser support |
|---|---|---|---|
| Animated GIF (320px) | 200-500KB | Mediocre (256 colors) | Universal |
| WebP animation (320px) | 80-150KB | Good | 97%+ |
MP4 <video> (320px, 3s loop) |
30-80KB | Excellent | Universal |
WebM <video> (320px, 3s loop) |
20-60KB | Excellent | 96%+ |
| Static WebP screenshot | 30-80KB | Excellent | 97%+ |
Recommendation: Use <video autoplay muted loop playsinline> with a WebP poster frame. This gives GIF-like behavior at 1/5th the bandwidth. The poster doubles as the static image arm. With lazy loading via IntersectionObserver, only above-fold cards load media initially, keeping initial page weight under 500KB.
Testing approach — two independent bandits:
The visual treatment question is independent of card ordering. Running them as a single combined test (5 orderings × 3 visual treatments = 15 arms) would require ~45 days to converge at 100 visitors/day. Instead, run two independent Thompson Sampling bandits:
- Ordering bandit (existing): which games appear first (classics, casual, action, random, default)
- Visual treatment bandit (new): how cards look (text-only, static image, autoplay video)
Each converges in ~10-15 days at 100 visitors/day. The interaction between ordering and visual treatment is unlikely to be significant at low traffic, and this approach gets answers 3x faster.
The three visual treatment arms:
- text: Current cards. Title, description, controls. No image.
- image: Same text content, but with a static WebP screenshot at the top of each card showing a few seconds into gameplay.
- video: Same as image but the poster frame is replaced with a 3-5 second looping <video> that autoplays muted. Falls back to poster on mobile/slow connections.
5. Reward Function Design
Composite Reward
For a page where the goal is "user clicks into a game and actually plays":
reward = 0.2 * clicked
+ 0.4 * played_30s
+ 0.3 * min(play_duration / 300, 1.0)
+ 0.1 * returned_within_24h
The heavy weight on played_30s is deliberate: a click that leads to a bounce is worse than no click — it's a false positive that misleads the optimizer. The 30-second threshold filters accidental clicks and immediate bounces.
Two-Speed Feedback Loop
- Fast loop (seconds): Use
clickedsignal only. Update bandit parameters immediately. ~100 signals/day at 100 visitors/day. - Slow loop (daily batch): Compute full composite reward by joining landing page events with recorder.js session data (same
collector_id, same game, within 60s of click). Correct bandit parameters for clickbait failure mode.
Attribution is straightforward: recorder.js already assigns a persistent collector_id via localStorage that links landing page visits → card clicks → game sessions → play duration → return visits.
6. Phased Implementation Plan
Phase 0: Instrumentation (Build First, 0 Traffic Needed)
Add a lightweight event tracker to index.html:
- Page load: {event: 'pageview', collector_id, viewport, referrer, timestamp, user_agent}
- Card impressions via IntersectionObserver: which cards were visible
- Card click: {event: 'card_click', collector_id, game, position, time_since_load}
- Scroll depth: max scroll percentage reached
Route: POST /api/events/landing on the ingest hub. Store as JSONL on the SSD.
Time to build: 1-2 days. Prerequisite for everything else.
Phase 1: Analytics-Driven Static Ordering (100 visitors/day)
After 1-2 weeks of event data:
1. Python script reads JSONL log
2. Computes per-game quality = click_rate × avg(min(play_duration/300, 1))
3. Joins with recorder.js session data via collector_id
4. Outputs card_order.json
5. 5 lines of JS in index.html fetches it and reorders .games-grid children
6. Cron runs daily
Expected lift: 10-20%. Zero ML, just analytics.
Phase 2: Thompson Sampling Bandit (100+ visitors/day)
Pure client-side JavaScript, no server changes beyond a cron job:
- Define 5 card orderings as arms (by quality, by click rate, by play duration, by popularity, reverse/novelty)
- Bandit state in
bandit_state.json:{arms: [{alpha: 1, beta: 1}, ...], orderings: [...]} - Client JS samples from
Beta(alpha_i, beta_i)for each arm, picks highest, reorders cards - Event payload includes which arm was shown
- Hourly Python cron reads events, updates alpha/beta, writes JSON
Thompson Sampling for Bernoulli bandits converges to near-optimal arm selection within ~200-300 trials per arm. With 5 arms at 100 visitors/day, meaningful convergence in 10-15 days. At 1,000/day, convergence in 1-2 days.
Expected lift over Phase 1: 5-15% additional.
Phase 3: Contextual Bandit (500+ visitors/day)
Replace static JSON with a lightweight Python Flask endpoint on the Jetson:
- Accepts context: {is_mobile, is_returning, hour_bucket, viewport_bucket}
- Returns card ordering via LinUCB or logistic Thompson Sampling
- Retrains hourly on batch of last 24h
Context-aware: mobile users might prefer simpler games (Snake, Flappy); returning visitors see games they haven't tried; evening visitors get different ordering than morning.
Expected lift over Phase 2: 5-15% additional.
Phase 4: In-Browser Behavioral Model (1,000+ visitors/day)
This is where the neural net enters: 1. Train a small MLP offline on accumulated behavioral data (mouse velocity, scroll patterns, hesitation → composite reward) 2. Export as 50KB ONNX model 3. Load in Web Worker via ONNX Runtime Web (WASM backend) 4. Capture behavioral signals passively for 3-10 seconds 5. Run inference (<1ms), get intent score 6. Adapt page: reorder cards, adjust visual emphasis, show/hide mission bar
This phase is only justified when you have enough labeled behavioral data to train a model that outperforms the bandit heuristics. At 1,000 visitors/day, you accumulate ~30K labeled sessions/month — potentially enough for a simple MLP.
Phase 5: Full RL (10,000+ visitors/day)
Formulate as MDP per Amazon's offline DQN approach: state = user context + page config, action = layout variant, reward = composite signal. Train offline on logged interactions, deploy policy as inference endpoint.
Only justified at massive scale. Marginal lift over contextual bandits is typically 5-10%.
Summary
| Phase | Traffic | Complexity | Cumulative Lift | Time |
|---|---|---|---|---|
| 0: Instrumentation | 0 | Low | Baseline | 1-2 days |
| 1: Analytics ordering | 100/day | Low | 10-20% | 1 day |
| 2: Thompson Sampling | 100/day | Medium | 15-30% | 2-3 days |
| 3: Contextual bandit | 500/day | High | 20-40% | 1-2 weeks |
| 4: Browser behavioral model | 1,000/day | High | 25-45% | 2-4 weeks |
| 5: Full RL | 10,000/day | Very High | 30-50% | Months |
Recommendation
Start with Phase 0 immediately. Without landing page event tracking, nothing else is possible. The instrumentation is ~50 lines of JS and a new POST route on the ingest hub.
Phase 1 is the highest-ROI change. Reordering cards by actual engagement data will likely produce 10-20% lift with a day of work. This doesn't need ML — it needs analytics.
Phase 2 (Thompson Sampling) is the sweet spot for an HN launch. It's intellectually interesting ("our landing page uses multi-armed bandits to optimize in real time"), practically effective, and achievable in 2-3 days. It also generates a compelling narrative: "the same site that collects training data for our vision model also optimizes itself."
Skip straight to Phase 4 only if you're getting 1,000+ visitors/day consistently. Below that threshold, the bandit will outperform any neural net because it has better sample efficiency for this problem structure.
The in-browser neural net is the long-term play, but the bandit is the right tool for launch.
Open Questions
-
What's the current traffic level? The entire phasing depends on daily visitor count. If the HN launch brings a spike, Phase 2 could converge in hours rather than weeks.
-
Should the optimization be visible? An HN audience might appreciate transparency: "This page is optimizing itself — here's what arm you're seeing." This could be a feature, not a hidden trick.
-
Card ordering vs. card content vs. card visual treatment: The bandit now tests ordering AND visual treatment as independent dimensions. The remaining question is whether to also test different descriptions/CTAs per game — a third bandit. At low traffic, two bandits is the practical limit.
-
Interaction between landing page optimization and training data collection: If the optimizer concentrates traffic on 2-3 games, training data diversity drops. Should there be an exploration bonus that values training data diversity alongside click-throughs?
-
Reward attribution latency: The composite reward requires joining landing page events with recorder.js sessions. Is the ingest hub already storing both in a way that supports this join, or does the schema need work?
Key Sources
- Miikkulainen et al. — CRO through Evolutionary Computation (arXiv:1703.00556)
- Qiu & Miikkulainen — Evolutionary CRO via MAB (AAAI 2019)
- Chapelle & Li — Empirical Evaluation of Thompson Sampling (Microsoft Research)
- Russo et al. — Tutorial on Thompson Sampling (Stanford)
- Anatomizing Deep Learning Inference in Web Browsers (ACM TOSEM 2024)
- Kuric — Mouse Interaction Data for ML (Smashing Magazine)
- Amazon — Offline Deep Q-Learning for Page Layout (RecSys 2022)
- Expedia — Contextual Bandits for Web Optimization
- CXL — Serial Position Effect in CRO
- Unbounce Smart Traffic Documentation
- Zebracat — Video Marketing Statistics 2025
- MarketingSherpa — Animated GIF vs Static Image A/B Test
- MarketingProfs — Cinemagraph Engagement Data (Microsoft)
- Phiture/ASOStack — Steam Page Optimization
- Portent — Site Speed and Conversion Rate
- Apple Developer — Product Page Optimization
- AdEspresso — Animation vs Still Images