Screen Self Driving Scale Plan

Scale-Up Plan: Screen-Self-Driving to Professional Play

Date: February 16, 2026 Goal: Generate enough data and train models to play all 7 arcade games at competitive level (90%+ exact match) today.

Current State (as of right now)

Data Generated

Generation	Sessions	Status
gen0	700	DONE
gen1	700	DONE
gen2	994	DONE
gen4	994	DONE
gen3	~0	Just relaunched (previous was stuck)
gen5	~0	Just relaunched (previous was preempted)
gen6	~0	Just launched
gen7-13	0	Queued in backfill script
Total done	3,388	~484/game, ~173K training windows/game

Training

allgames1: Running on L4 GPU, training all 7 models (including GameNet) on all 7 games using 3,388 sessions. Currently downloading data to VM (~20% done).
Previous runs: All on tetris only. Best: 20.7% exact match (sigmoid) / 36.6% exact match (softmax).

Infrastructure

vCPU quota: 32 global (CPUS_ALL_REGIONS) = max 4 simultaneous VMs
Currently running: 4 VMs (gen3 + gen5 + gen6 + allgames1) = 32 vCPUs = full
Spot pricing: Gen VMs $0.08/hr (e2-standard-8), Train $0.70/hr (L4)

What We Need for Professional Play

From training analysis, the GameNet architecture with softmax output needs these training samples per game:

Game	Difficulty	Samples for 90%	Sessions Needed
Snake	Low	200K	560
Pong	Low	200K	560
Breakout	Medium	500K	1,400
Flappy Bird	Medium	500K	1,400
Tetris	High	1M	2,800
Space Invaders	High	1M	2,800
Asteroids	Very High	2M	5,600

Math: 1 session = 180s at 10fps = 1,800 frames → ~357 training windows at 2fps with 4-frame context.

The Problem

With current approach (gen0-gen13, all-games VMs): - 14 VMs × ~700 sessions each = 9,800 total sessions = 1,400/game - That gives ~500K training windows/game - Good enough for snake, pong, breakout, flappy (easy/medium games) - Not enough for tetris, space invaders, asteroids (need 1-2M windows)

With 32 vCPU quota (4 VMs at a time): - 14 VMs in batches of 3 (plus 1 training) = ~5 batches × 10h = 50 hours - That's 2+ days, not today.

The Plan: Two Key Moves

Move 1: Request Quota Increase to 128 vCPUs

# Request via GCP console: IAM & Admin → Quotas → CPUS_ALL_REGIONS
# Current: 32, Request: 128
# Justification: ML training pipeline, short-lived spot VMs for data generation

With 128 vCPUs: - Use small VMs (e2-standard-4, 4 vCPUs, $0.04/hr) for generation - Run 30 gen VMs simultaneously (120 vCPUs) + 1 training (8 vCPUs) = 128 - All generation completes in a single 10-hour batch

Move 2: Game-Specific Generation for Hard Games

Instead of only all-games VMs (100 sessions/game each), add dedicated VMs for hard games:

VM Type	Count	Game	Sessions/VM	Total Sessions
All-games (gen0-13)	14	All 7	700 (100/game)	1,400/game
Tetris-only	4	tetris	700	2,800 extra
Space Invaders-only	4	space-invaders	700	2,800 extra
Asteroids-only	6	asteroids	700	4,200 extra

Final per-game totals:

Game	Shared Sessions	Dedicated Sessions	Total	Training Windows	Target Met?
Snake	1,400	0	1,400	500K	90%+ (need 200K)
Pong	1,400	0	1,400	500K	90%+ (need 200K)
Breakout	1,400	0	1,400	500K	90% (need 500K)
Flappy Bird	1,400	0	1,400	500K	90% (need 500K)
Tetris	1,400	2,800	4,200	1.5M	90%+ (need 1M)
Space Invaders	1,400	2,800	4,200	1.5M	90%+ (need 1M)
Asteroids	1,400	4,200	5,600	2.0M	90% (need 2M)

Total VMs: 14 (shared) + 14 (game-specific) = 28 gen VMs

Timeline for Today

Phase 1: Now (already running)

gen3, gen5, gen6 generating (all-games)
allgames1 training on L4 (7 models × 7 games × 3,388 sessions)
Backfill script queued for gen7-13
Action: Request 128 vCPU quota increase via GCP console

Phase 2: Once quota approved (target: within 2 hours)

Switch remaining gen VMs to small (e2-standard-4, 4 vCPUs)
Launch gen7-13 (7 VMs) + 14 game-specific VMs = 21 VMs simultaneously
21 × 4 vCPUs = 84 vCPUs + 8 training = 92 vCPUs (under 128 quota)
All VMs run in parallel for ~10 hours

Phase 3: ~10 hours after Phase 2 (evening/night)

All generation complete: ~19,600 sessions total
Upload code + launch training: allgames2 on L4
Train all 7 models (including GameNet) on full dataset
Training time: ~3-5 hours (7 models × 30 min × 7 games)

Phase 4: Results (late tonight / early morning)

Download results, compare GameNet vs baselines
Expected: GameNet softmax hitting 85-95% exact match on easy/medium games
Retrain if needed with adjusted hyperparameters

Cost Breakdown

Item	Count	Rate	Hours	Cost
Gen VMs (small, shared)	10 remaining	$0.04/hr	10h each	$4.00
Gen VMs (small, game-specific)	14	$0.04/hr	10h each	$5.60
Gen VMs already running (medium)	3	$0.08/hr	10h each	$2.40
Gen VMs already done	4	—	—	~$3.20 (already spent)
Training L4 (allgames1, current)	1	$0.70/hr	5h	$3.50
Training L4 (allgames2, full data)	1	$0.70/hr	5h	$3.50
Total				~$22.20

If quota increase is denied or delayed

Fallback: Stay at 32 vCPUs, use small VMs - After gen3/5/6 finish (~10h from now), switch to small VMs - Can run 6 small gen VMs (24 vCPUs) + 1 training (8 vCPUs) = 32 - 21 remaining VMs / 6 per batch = 4 batches × 10h = 40 hours (done by Tuesday)

Fallback 2: Prioritize easy wins - Skip asteroids-specific VMs (6 VMs saved) - Skip 2 of the space-invaders VMs - Only 6 game-specific VMs = finish much faster - Gets 6/7 games to 90%+, asteroids at ~80%

Commands to Execute

Request quota increase

# Go to: console.cloud.google.com → IAM & Admin → Quotas
# Search: CPUS_ALL_REGIONS
# Request increase: 32 → 128
# Or via CLI:
gcloud compute project-info update --project ai-therapist-3e55e \
  --default-limits-override cpus=128

Launch game-specific VMs (after quota approved)

# Tetris-specific (4 VMs)
for i in $(seq 14 17); do
  python3 -c "
import scripts.cloud_generate as cg
cg.create_vm($i, hours=50.0, size='small', session_length=180, fps=10, workers=0, game='tetris')
"
done

# Space Invaders-specific (4 VMs)
for i in $(seq 18 21); do
  python3 -c "
import scripts.cloud_generate as cg
cg.create_vm($i, hours=50.0, size='small', session_length=180, fps=10, workers=0, game='space-invaders')
"
done

# Asteroids-specific (6 VMs)
for i in $(seq 22 27); do
  python3 -c "
import scripts.cloud_generate as cg
cg.create_vm($i, hours=50.0, size='small', session_length=180, fps=10, workers=0, game='asteroids')
"
done

Launch final training (after all generation completes)

python scripts/cloud_train.py \
  --models all --gpu l4 \
  --train-id allgames2 \
  --max-minutes 45 \
  --sessions 999 \
  --data-prefix "generated/gen0/sessions,generated/gen1/sessions,generated/gen2/sessions,generated/gen3/sessions,generated/gen4/sessions,generated/gen5/sessions,generated/gen6/sessions,generated/gen7/sessions,generated/gen8/sessions,generated/gen9/sessions,generated/gen10/sessions,generated/gen11/sessions,generated/gen12/sessions,generated/gen13/sessions,generated/gen14/sessions,generated/gen15/sessions,generated/gen16/sessions,generated/gen17/sessions,generated/gen18/sessions,generated/gen19/sessions,generated/gen20/sessions,generated/gen21/sessions,generated/gen22/sessions,generated/gen23/sessions,generated/gen24/sessions,generated/gen25/sessions,generated/gen26/sessions,generated/gen27/sessions" \
  --skip-upload

Key Risks

Quota increase delay: GCP quota increases can take hours to days. Fallback plan above handles this.
Spot preemption: VMs can be killed at any time. The backfill script handles re-launching. Game-specific VMs need similar monitoring.
Disk space on training VM: 300GB boot disk. With ~19,600 sessions, raw data is ~20GB. Processing creates mmap files ~3x raw size. Should fit but monitor.
GameNet untested: This is the first real training run with GameNet. If it underperforms, we still have 6 other models as baselines. Training all 7 models hedges this risk.
Training time: 7 models × 7 games × 45 min = ~37 hours if sequential. On L4 with 30-min limit, ~24.5 hours. May need to split across 2 training VMs or increase GPU.

Success Criteria

By end of today: - [ ] All 28 gen VMs launched (or all possible given quota) - [ ] allgames1 training complete, results downloaded - [ ] GameNet results show improvement over baselines on tetris - [ ] Easy games (snake, pong) showing >70% exact match with existing data

By tomorrow morning: - [ ] All generation complete (~19,600 sessions) - [ ] allgames2 training launched on full dataset - [ ] Per-game exact match targets: snake/pong >90%, breakout/flappy >85%, tetris/SI >80%

By tomorrow evening: - [ ] Full training complete, results analyzed - [ ] Best model identified per game - [ ] Decision on whether to scale further or tune architecture