Screen Self Driving Scale Plan
Scale-Up Plan: Screen-Self-Driving to Professional Play
Date: February 16, 2026 Goal: Generate enough data and train models to play all 7 arcade games at competitive level (90%+ exact match) today.
Current State (as of right now)
Data Generated
| Generation | Sessions | Status |
|---|---|---|
| gen0 | 700 | DONE |
| gen1 | 700 | DONE |
| gen2 | 994 | DONE |
| gen4 | 994 | DONE |
| gen3 | ~0 | Just relaunched (previous was stuck) |
| gen5 | ~0 | Just relaunched (previous was preempted) |
| gen6 | ~0 | Just launched |
| gen7-13 | 0 | Queued in backfill script |
| Total done | 3,388 | ~484/game, ~173K training windows/game |
Training
- allgames1: Running on L4 GPU, training all 7 models (including GameNet) on all 7 games using 3,388 sessions. Currently downloading data to VM (~20% done).
- Previous runs: All on tetris only. Best: 20.7% exact match (sigmoid) / 36.6% exact match (softmax).
Infrastructure
- vCPU quota: 32 global (CPUS_ALL_REGIONS) = max 4 simultaneous VMs
- Currently running: 4 VMs (gen3 + gen5 + gen6 + allgames1) = 32 vCPUs = full
- Spot pricing: Gen VMs $0.08/hr (e2-standard-8), Train $0.70/hr (L4)
What We Need for Professional Play
From training analysis, the GameNet architecture with softmax output needs these training samples per game:
| Game | Difficulty | Samples for 90% | Sessions Needed |
|---|---|---|---|
| Snake | Low | 200K | 560 |
| Pong | Low | 200K | 560 |
| Breakout | Medium | 500K | 1,400 |
| Flappy Bird | Medium | 500K | 1,400 |
| Tetris | High | 1M | 2,800 |
| Space Invaders | High | 1M | 2,800 |
| Asteroids | Very High | 2M | 5,600 |
Math: 1 session = 180s at 10fps = 1,800 frames → ~357 training windows at 2fps with 4-frame context.
The Problem
With current approach (gen0-gen13, all-games VMs): - 14 VMs × ~700 sessions each = 9,800 total sessions = 1,400/game - That gives ~500K training windows/game - Good enough for snake, pong, breakout, flappy (easy/medium games) - Not enough for tetris, space invaders, asteroids (need 1-2M windows)
With 32 vCPU quota (4 VMs at a time): - 14 VMs in batches of 3 (plus 1 training) = ~5 batches × 10h = 50 hours - That's 2+ days, not today.
The Plan: Two Key Moves
Move 1: Request Quota Increase to 128 vCPUs
# Request via GCP console: IAM & Admin → Quotas → CPUS_ALL_REGIONS
# Current: 32, Request: 128
# Justification: ML training pipeline, short-lived spot VMs for data generation
With 128 vCPUs: - Use small VMs (e2-standard-4, 4 vCPUs, $0.04/hr) for generation - Run 30 gen VMs simultaneously (120 vCPUs) + 1 training (8 vCPUs) = 128 - All generation completes in a single 10-hour batch
Move 2: Game-Specific Generation for Hard Games
Instead of only all-games VMs (100 sessions/game each), add dedicated VMs for hard games:
| VM Type | Count | Game | Sessions/VM | Total Sessions |
|---|---|---|---|---|
| All-games (gen0-13) | 14 | All 7 | 700 (100/game) | 1,400/game |
| Tetris-only | 4 | tetris | 700 | 2,800 extra |
| Space Invaders-only | 4 | space-invaders | 700 | 2,800 extra |
| Asteroids-only | 6 | asteroids | 700 | 4,200 extra |
Final per-game totals:
| Game | Shared Sessions | Dedicated Sessions | Total | Training Windows | Target Met? |
|---|---|---|---|---|---|
| Snake | 1,400 | 0 | 1,400 | 500K | 90%+ (need 200K) |
| Pong | 1,400 | 0 | 1,400 | 500K | 90%+ (need 200K) |
| Breakout | 1,400 | 0 | 1,400 | 500K | 90% (need 500K) |
| Flappy Bird | 1,400 | 0 | 1,400 | 500K | 90% (need 500K) |
| Tetris | 1,400 | 2,800 | 4,200 | 1.5M | 90%+ (need 1M) |
| Space Invaders | 1,400 | 2,800 | 4,200 | 1.5M | 90%+ (need 1M) |
| Asteroids | 1,400 | 4,200 | 5,600 | 2.0M | 90% (need 2M) |
Total VMs: 14 (shared) + 14 (game-specific) = 28 gen VMs
Timeline for Today
Phase 1: Now (already running)
- gen3, gen5, gen6 generating (all-games)
- allgames1 training on L4 (7 models × 7 games × 3,388 sessions)
- Backfill script queued for gen7-13
- Action: Request 128 vCPU quota increase via GCP console
Phase 2: Once quota approved (target: within 2 hours)
- Switch remaining gen VMs to small (e2-standard-4, 4 vCPUs)
- Launch gen7-13 (7 VMs) + 14 game-specific VMs = 21 VMs simultaneously
- 21 × 4 vCPUs = 84 vCPUs + 8 training = 92 vCPUs (under 128 quota)
- All VMs run in parallel for ~10 hours
Phase 3: ~10 hours after Phase 2 (evening/night)
- All generation complete: ~19,600 sessions total
- Upload code + launch training: allgames2 on L4
- Train all 7 models (including GameNet) on full dataset
- Training time: ~3-5 hours (7 models × 30 min × 7 games)
Phase 4: Results (late tonight / early morning)
- Download results, compare GameNet vs baselines
- Expected: GameNet softmax hitting 85-95% exact match on easy/medium games
- Retrain if needed with adjusted hyperparameters
Cost Breakdown
| Item | Count | Rate | Hours | Cost |
|---|---|---|---|---|
| Gen VMs (small, shared) | 10 remaining | $0.04/hr | 10h each | $4.00 |
| Gen VMs (small, game-specific) | 14 | $0.04/hr | 10h each | $5.60 |
| Gen VMs already running (medium) | 3 | $0.08/hr | 10h each | $2.40 |
| Gen VMs already done | 4 | — | — | ~$3.20 (already spent) |
| Training L4 (allgames1, current) | 1 | $0.70/hr | 5h | $3.50 |
| Training L4 (allgames2, full data) | 1 | $0.70/hr | 5h | $3.50 |
| Total | ~$22.20 |
If quota increase is denied or delayed
Fallback: Stay at 32 vCPUs, use small VMs - After gen3/5/6 finish (~10h from now), switch to small VMs - Can run 6 small gen VMs (24 vCPUs) + 1 training (8 vCPUs) = 32 - 21 remaining VMs / 6 per batch = 4 batches × 10h = 40 hours (done by Tuesday)
Fallback 2: Prioritize easy wins - Skip asteroids-specific VMs (6 VMs saved) - Skip 2 of the space-invaders VMs - Only 6 game-specific VMs = finish much faster - Gets 6/7 games to 90%+, asteroids at ~80%
Commands to Execute
Request quota increase
# Go to: console.cloud.google.com → IAM & Admin → Quotas
# Search: CPUS_ALL_REGIONS
# Request increase: 32 → 128
# Or via CLI:
gcloud compute project-info update --project ai-therapist-3e55e \
--default-limits-override cpus=128
Launch game-specific VMs (after quota approved)
# Tetris-specific (4 VMs)
for i in $(seq 14 17); do
python3 -c "
import scripts.cloud_generate as cg
cg.create_vm($i, hours=50.0, size='small', session_length=180, fps=10, workers=0, game='tetris')
"
done
# Space Invaders-specific (4 VMs)
for i in $(seq 18 21); do
python3 -c "
import scripts.cloud_generate as cg
cg.create_vm($i, hours=50.0, size='small', session_length=180, fps=10, workers=0, game='space-invaders')
"
done
# Asteroids-specific (6 VMs)
for i in $(seq 22 27); do
python3 -c "
import scripts.cloud_generate as cg
cg.create_vm($i, hours=50.0, size='small', session_length=180, fps=10, workers=0, game='asteroids')
"
done
Launch final training (after all generation completes)
python scripts/cloud_train.py \
--models all --gpu l4 \
--train-id allgames2 \
--max-minutes 45 \
--sessions 999 \
--data-prefix "generated/gen0/sessions,generated/gen1/sessions,generated/gen2/sessions,generated/gen3/sessions,generated/gen4/sessions,generated/gen5/sessions,generated/gen6/sessions,generated/gen7/sessions,generated/gen8/sessions,generated/gen9/sessions,generated/gen10/sessions,generated/gen11/sessions,generated/gen12/sessions,generated/gen13/sessions,generated/gen14/sessions,generated/gen15/sessions,generated/gen16/sessions,generated/gen17/sessions,generated/gen18/sessions,generated/gen19/sessions,generated/gen20/sessions,generated/gen21/sessions,generated/gen22/sessions,generated/gen23/sessions,generated/gen24/sessions,generated/gen25/sessions,generated/gen26/sessions,generated/gen27/sessions" \
--skip-upload
Key Risks
- Quota increase delay: GCP quota increases can take hours to days. Fallback plan above handles this.
- Spot preemption: VMs can be killed at any time. The backfill script handles re-launching. Game-specific VMs need similar monitoring.
- Disk space on training VM: 300GB boot disk. With ~19,600 sessions, raw data is ~20GB. Processing creates mmap files ~3x raw size. Should fit but monitor.
- GameNet untested: This is the first real training run with GameNet. If it underperforms, we still have 6 other models as baselines. Training all 7 models hedges this risk.
- Training time: 7 models × 7 games × 45 min = ~37 hours if sequential. On L4 with 30-min limit, ~24.5 hours. May need to split across 2 training VMs or increase GPU.
Success Criteria
By end of today: - [ ] All 28 gen VMs launched (or all possible given quota) - [ ] allgames1 training complete, results downloaded - [ ] GameNet results show improvement over baselines on tetris - [ ] Easy games (snake, pong) showing >70% exact match with existing data
By tomorrow morning: - [ ] All generation complete (~19,600 sessions) - [ ] allgames2 training launched on full dataset - [ ] Per-game exact match targets: snake/pong >90%, breakout/flappy >85%, tetris/SI >80%
By tomorrow evening: - [ ] Full training complete, results analyzed - [ ] Best model identified per game - [ ] Decision on whether to scale further or tune architecture