Research

Digital Surface Labs

Screen Self Driving Scale Plan

Scale-Up Plan: Screen-Self-Driving to Professional Play

Date: February 16, 2026 Goal: Generate enough data and train models to play all 7 arcade games at competitive level (90%+ exact match) today.

Current State (as of right now)

Data Generated

Generation Sessions Status
gen0 700 DONE
gen1 700 DONE
gen2 994 DONE
gen4 994 DONE
gen3 ~0 Just relaunched (previous was stuck)
gen5 ~0 Just relaunched (previous was preempted)
gen6 ~0 Just launched
gen7-13 0 Queued in backfill script
Total done 3,388 ~484/game, ~173K training windows/game

Training

  • allgames1: Running on L4 GPU, training all 7 models (including GameNet) on all 7 games using 3,388 sessions. Currently downloading data to VM (~20% done).
  • Previous runs: All on tetris only. Best: 20.7% exact match (sigmoid) / 36.6% exact match (softmax).

Infrastructure

  • vCPU quota: 32 global (CPUS_ALL_REGIONS) = max 4 simultaneous VMs
  • Currently running: 4 VMs (gen3 + gen5 + gen6 + allgames1) = 32 vCPUs = full
  • Spot pricing: Gen VMs $0.08/hr (e2-standard-8), Train $0.70/hr (L4)

What We Need for Professional Play

From training analysis, the GameNet architecture with softmax output needs these training samples per game:

Game Difficulty Samples for 90% Sessions Needed
Snake Low 200K 560
Pong Low 200K 560
Breakout Medium 500K 1,400
Flappy Bird Medium 500K 1,400
Tetris High 1M 2,800
Space Invaders High 1M 2,800
Asteroids Very High 2M 5,600

Math: 1 session = 180s at 10fps = 1,800 frames → ~357 training windows at 2fps with 4-frame context.

The Problem

With current approach (gen0-gen13, all-games VMs): - 14 VMs × ~700 sessions each = 9,800 total sessions = 1,400/game - That gives ~500K training windows/game - Good enough for snake, pong, breakout, flappy (easy/medium games) - Not enough for tetris, space invaders, asteroids (need 1-2M windows)

With 32 vCPU quota (4 VMs at a time): - 14 VMs in batches of 3 (plus 1 training) = ~5 batches × 10h = 50 hours - That's 2+ days, not today.

The Plan: Two Key Moves

Move 1: Request Quota Increase to 128 vCPUs

# Request via GCP console: IAM & Admin → Quotas → CPUS_ALL_REGIONS
# Current: 32, Request: 128
# Justification: ML training pipeline, short-lived spot VMs for data generation

With 128 vCPUs: - Use small VMs (e2-standard-4, 4 vCPUs, $0.04/hr) for generation - Run 30 gen VMs simultaneously (120 vCPUs) + 1 training (8 vCPUs) = 128 - All generation completes in a single 10-hour batch

Move 2: Game-Specific Generation for Hard Games

Instead of only all-games VMs (100 sessions/game each), add dedicated VMs for hard games:

VM Type Count Game Sessions/VM Total Sessions
All-games (gen0-13) 14 All 7 700 (100/game) 1,400/game
Tetris-only 4 tetris 700 2,800 extra
Space Invaders-only 4 space-invaders 700 2,800 extra
Asteroids-only 6 asteroids 700 4,200 extra

Final per-game totals:

Game Shared Sessions Dedicated Sessions Total Training Windows Target Met?
Snake 1,400 0 1,400 500K 90%+ (need 200K)
Pong 1,400 0 1,400 500K 90%+ (need 200K)
Breakout 1,400 0 1,400 500K 90% (need 500K)
Flappy Bird 1,400 0 1,400 500K 90% (need 500K)
Tetris 1,400 2,800 4,200 1.5M 90%+ (need 1M)
Space Invaders 1,400 2,800 4,200 1.5M 90%+ (need 1M)
Asteroids 1,400 4,200 5,600 2.0M 90% (need 2M)

Total VMs: 14 (shared) + 14 (game-specific) = 28 gen VMs

Timeline for Today

Phase 1: Now (already running)

  • gen3, gen5, gen6 generating (all-games)
  • allgames1 training on L4 (7 models × 7 games × 3,388 sessions)
  • Backfill script queued for gen7-13
  • Action: Request 128 vCPU quota increase via GCP console

Phase 2: Once quota approved (target: within 2 hours)

  • Switch remaining gen VMs to small (e2-standard-4, 4 vCPUs)
  • Launch gen7-13 (7 VMs) + 14 game-specific VMs = 21 VMs simultaneously
  • 21 × 4 vCPUs = 84 vCPUs + 8 training = 92 vCPUs (under 128 quota)
  • All VMs run in parallel for ~10 hours

Phase 3: ~10 hours after Phase 2 (evening/night)

  • All generation complete: ~19,600 sessions total
  • Upload code + launch training: allgames2 on L4
  • Train all 7 models (including GameNet) on full dataset
  • Training time: ~3-5 hours (7 models × 30 min × 7 games)

Phase 4: Results (late tonight / early morning)

  • Download results, compare GameNet vs baselines
  • Expected: GameNet softmax hitting 85-95% exact match on easy/medium games
  • Retrain if needed with adjusted hyperparameters

Cost Breakdown

Item Count Rate Hours Cost
Gen VMs (small, shared) 10 remaining $0.04/hr 10h each $4.00
Gen VMs (small, game-specific) 14 $0.04/hr 10h each $5.60
Gen VMs already running (medium) 3 $0.08/hr 10h each $2.40
Gen VMs already done 4 ~$3.20 (already spent)
Training L4 (allgames1, current) 1 $0.70/hr 5h $3.50
Training L4 (allgames2, full data) 1 $0.70/hr 5h $3.50
Total ~$22.20

If quota increase is denied or delayed

Fallback: Stay at 32 vCPUs, use small VMs - After gen3/5/6 finish (~10h from now), switch to small VMs - Can run 6 small gen VMs (24 vCPUs) + 1 training (8 vCPUs) = 32 - 21 remaining VMs / 6 per batch = 4 batches × 10h = 40 hours (done by Tuesday)

Fallback 2: Prioritize easy wins - Skip asteroids-specific VMs (6 VMs saved) - Skip 2 of the space-invaders VMs - Only 6 game-specific VMs = finish much faster - Gets 6/7 games to 90%+, asteroids at ~80%

Commands to Execute

Request quota increase

# Go to: console.cloud.google.com → IAM & Admin → Quotas
# Search: CPUS_ALL_REGIONS
# Request increase: 32 → 128
# Or via CLI:
gcloud compute project-info update --project ai-therapist-3e55e \
  --default-limits-override cpus=128

Launch game-specific VMs (after quota approved)

# Tetris-specific (4 VMs)
for i in $(seq 14 17); do
  python3 -c "
import scripts.cloud_generate as cg
cg.create_vm($i, hours=50.0, size='small', session_length=180, fps=10, workers=0, game='tetris')
"
done

# Space Invaders-specific (4 VMs)
for i in $(seq 18 21); do
  python3 -c "
import scripts.cloud_generate as cg
cg.create_vm($i, hours=50.0, size='small', session_length=180, fps=10, workers=0, game='space-invaders')
"
done

# Asteroids-specific (6 VMs)
for i in $(seq 22 27); do
  python3 -c "
import scripts.cloud_generate as cg
cg.create_vm($i, hours=50.0, size='small', session_length=180, fps=10, workers=0, game='asteroids')
"
done

Launch final training (after all generation completes)

python scripts/cloud_train.py \
  --models all --gpu l4 \
  --train-id allgames2 \
  --max-minutes 45 \
  --sessions 999 \
  --data-prefix "generated/gen0/sessions,generated/gen1/sessions,generated/gen2/sessions,generated/gen3/sessions,generated/gen4/sessions,generated/gen5/sessions,generated/gen6/sessions,generated/gen7/sessions,generated/gen8/sessions,generated/gen9/sessions,generated/gen10/sessions,generated/gen11/sessions,generated/gen12/sessions,generated/gen13/sessions,generated/gen14/sessions,generated/gen15/sessions,generated/gen16/sessions,generated/gen17/sessions,generated/gen18/sessions,generated/gen19/sessions,generated/gen20/sessions,generated/gen21/sessions,generated/gen22/sessions,generated/gen23/sessions,generated/gen24/sessions,generated/gen25/sessions,generated/gen26/sessions,generated/gen27/sessions" \
  --skip-upload

Key Risks

  1. Quota increase delay: GCP quota increases can take hours to days. Fallback plan above handles this.
  2. Spot preemption: VMs can be killed at any time. The backfill script handles re-launching. Game-specific VMs need similar monitoring.
  3. Disk space on training VM: 300GB boot disk. With ~19,600 sessions, raw data is ~20GB. Processing creates mmap files ~3x raw size. Should fit but monitor.
  4. GameNet untested: This is the first real training run with GameNet. If it underperforms, we still have 6 other models as baselines. Training all 7 models hedges this risk.
  5. Training time: 7 models × 7 games × 45 min = ~37 hours if sequential. On L4 with 30-min limit, ~24.5 hours. May need to split across 2 training VMs or increase GPU.

Success Criteria

By end of today: - [ ] All 28 gen VMs launched (or all possible given quota) - [ ] allgames1 training complete, results downloaded - [ ] GameNet results show improvement over baselines on tetris - [ ] Easy games (snake, pong) showing >70% exact match with existing data

By tomorrow morning: - [ ] All generation complete (~19,600 sessions) - [ ] allgames2 training launched on full dataset - [ ] Per-game exact match targets: snake/pong >90%, breakout/flappy >85%, tetris/SI >80%

By tomorrow evening: - [ ] Full training complete, results analyzed - [ ] Best model identified per game - [ ] Decision on whether to scale further or tune architecture