Research

Digital Surface Labs

"Task Scheduling with WSJF: Prioritizing Personal AI Work"

Task Scheduling with WSJF: Prioritizing Personal AI Work

1. The Question

"Has somebody thought about this sort of problem before and there's a clear concept that encapsulates this?"

Yes. The answer is WSJF -- Weighted Shortest Job First, from lean product development and the Scaled Agile Framework (SAFe). It was formalized by Don Reinertsen in The Principles of Product Development Flow (2009) and later adopted wholesale by SAFe for backlog prioritization. The core idea is simple: always do the job with the highest economic return per unit of time consumed.

But your version is more nuanced than standard WSJF. Standard WSJF assumes the work is "free" once you decide to do it -- the only cost is duration. Your problem has a second cost axis: human attention. A task that runs autonomously on the Jetson for 60 minutes is fundamentally cheaper than a task that requires 60 minutes of your time, even if the GPU cost is identical. Your formula must deduct human time cost from the value, not just divide by it.

This document lays out the established concepts, derives the priority formula, designs the data model, and maps it to existing Dollar Hound code so the whole system can be built in 200-300 lines of scheduling logic on top of OpenClaw.


2. Established Concepts That Map to This Problem

Five well-studied frameworks each contribute a piece of the answer.

WSJF (Weighted Shortest Job First)

The primary framework. Originated in lean manufacturing, formalized by Reinertsen, adopted by SAFe.

Canonical formula:

WSJF = Cost of Delay / Job Duration

Where Cost of Delay is decomposed into three components:

Component What It Measures Example
User-Business Value How much is this worth? $150 average claim recovery
Time Criticality Does value decay if we wait? Tax deadline April 15
Risk Reduction / Opportunity Enablement Does this unlock other work? Building the email pipeline enables all outreach

Your modification: Replace "Cost of Delay" with "net value after human time cost." Standard WSJF treats job duration as the denominator -- longer jobs are penalized. Your version goes further: human time is not just a duration penalty, it is an economic deduction from value. An hour of your time has a dollar cost, and that cost comes straight off the top.

Key insight: Always do the highest value-to-duration ratio first, not the highest absolute value. A $20 task taking 5 minutes beats a $400 task taking 2 hours. This is counterintuitive -- the $400 task feels more important -- but the math is clear. If you do the $20/5min task first and then the $400/2hr task, you capture $420 in 2h05m. If you reverse the order, you still capture $420 but the $20 task was delayed 2 hours for no reason. Scale that across dozens of tasks and the ordering effect compounds.

Multi-Dimensional Knapsack Problem

For capacity planning. The classic knapsack asks: given a bag with weight limit W and items with weights and values, which items maximize total value? The multi-dimensional variant adds constraints:

  • X GPU-hours per day
  • Y human-attention-minutes per day
  • Z network-bandwidth-hours per day

Which combination of tasks maximizes total value without exceeding any constraint?

This is NP-hard in general, but with dozens of tasks (not thousands), a greedy approximation -- sort by WSJF score, pack in order, skip if a constraint is violated -- is provably within 50% of optimal (the classic greedy knapsack bound). In practice it is much closer to optimal because the tasks are heterogeneous enough that greedy packing wastes very little capacity.

Rate-Monotonic Scheduling

From real-time operating systems (Liu & Layland, 1973). The rule: tasks with shorter periods get higher baseline priority than tasks with longer periods.

Applied here: a daily email triage (period = 1 day) gets higher baseline priority than a quarterly benefits check (period = 90 days). This is already implicit in WSJF -- shorter-period tasks tend to have higher time criticality -- but it is worth calling out as a scheduling principle because it guarantees that high-frequency monitoring tasks never get starved by long-running batch jobs.

Earliest Deadline First (EDF)

Also from real-time systems. The rule: always run the task whose absolute deadline is soonest. If you can meet all deadlines, EDF will find the schedule. If you cannot, EDF will miss the fewest.

Applied here: tax deadline April 15, claims portal filing window closes, insurance enrollment period ends. These tasks must bubble up regardless of their WSJF score as the deadline approaches. The time_decay_factor in the formula below implements this -- urgency amplifies value as the deadline nears.

Heterogeneous Computing Task Scheduling

From grid and cloud computing research (HEFT algorithm, Topcuoglu et al., 2002). The problem: scheduling tasks across machines with different capabilities (GPU vs CPU), different costs, and different speeds.

Used in Kubernetes, Apache Mesos, and YARN. But those systems handle thousands of tasks on hundreds of machines. Your problem is dramatically simpler: dozens of task types, one primary machine (Orin Nano), two scarce resources (GPU time and human attention). A full heterogeneous scheduler is overkill. The concepts transfer -- resource-aware task placement, capability matching, spillover to secondary hosts -- but a 200-line implementation suffices where Kubernetes needs 2 million lines.

The existing state-queue.js assignHostsToQueue() function already implements a simplified version of this: it walks preferred hosts, checks disk budget, and assigns the first host with capacity. The generalized scheduler extends this pattern from disk-only to GPU + human attention.


3. The Priority Formula

Modified WSJF with Net Value

priority_score = net_value / total_resource_cost

where:
  net_value = (expected_value * probability_of_success * time_decay_factor)
              - (human_minutes * hourly_rate / 60)

  total_resource_cost = gpu_minutes + cpu_minutes + (human_minutes * attention_weight)

  time_decay_factor = 1.0 + (urgency_boost / days_until_deadline)  [for deadline tasks]
                    = 1.0  [for non-deadline tasks]

Parameter definitions:

Parameter Meaning Typical Value
expected_value Dollar value if the task succeeds $20 - $500
probability_of_success Likelihood of realizing that value 0.05 - 0.9
time_decay_factor Multiplier that increases as deadlines approach 1.0 - 6.0
human_minutes Minutes of your attention required 0 - 120
hourly_rate Opportunity cost of your time $200/hr
gpu_minutes GPU compute time 2 - 420
cpu_minutes CPU compute time (usually not the bottleneck) 0 - 60
attention_weight How much harder human minutes are than GPU minutes 3 (i.e., 1 human minute = 3 GPU minutes in the denominator)
urgency_boost How aggressively deadlines amplify priority 10 (tunable)

Worked Examples

Task A: California unclaimed property sweep

A full manual sweep of CA records -- downloading bulk data, matching against target names, enriching results, reviewing matches.

expected_value       = $400
probability          = 0.3
time_decay_factor    = 1.0 (no deadline)
human_minutes        = 50
hourly_rate          = $200/hr
gpu_minutes          = 60
cpu_minutes          = 0

net_value = ($400 * 0.3 * 1.0) - (50 * $200 / 60)
          = $120 - $166.67
          = -$46.67

priority_score = NEGATIVE -- do not run

This task is net-negative at $200/hr human time. The math is telling you something important: either the probability needs to be higher (better matching), or human time must be reduced through automation. If you can cut human_minutes from 50 to 10 (by automating the review step), the equation flips:

net_value = $120 - (10 * $200 / 60) = $120 - $33.33 = $86.67
total_resource_cost = 60 + 0 + (10 * 3) = 90
priority_score = $86.67 / 90 = 0.96

Automation turns a net-negative task into a worthwhile one. The formula quantifies exactly how much automation you need to justify the work.

Task B: Auto-monitoring for new claims (fully autonomous)

A background scan that checks portals for new claims matching known targets. Zero human involvement.

expected_value       = $20
probability          = 0.8
time_decay_factor    = 1.0
human_minutes        = 0
hourly_rate          = $200/hr
gpu_minutes          = 5
cpu_minutes          = 0

net_value = ($20 * 0.8 * 1.0) - (0 * $200 / 60) = $16 - $0 = $16
total_resource_cost  = 5 + 0 + (0 * 3) = 5
priority_score       = $16 / 5 = 3.20

Small value, but excellent ratio. Fully autonomous tasks with even modest expected value dominate the priority queue because they cost zero human time. The system will naturally fill idle GPU hours with these.

Task C: Tax prep with deadline approaching

Tax filing is high-value but human-intensive. Watch how the deadline changes the math.

At 30 days out:

expected_value       = $300
probability          = 0.9
urgency_boost        = 10
days_until_deadline  = 30
time_decay_factor    = 1.0 + (10 / 30) = 1.33
human_minutes        = 120
gpu_minutes          = 420  (7 hours of document processing, form analysis)

net_value = ($300 * 0.9 * 1.33) - (120 * $200 / 60)
          = $359.10 - $400.00
          = -$40.90

priority_score = NEGATIVE at 30 days

At 30 days, the formula says: not yet. Other tasks have better ROI. But watch what happens as the deadline closes in.

At 7 days out:

time_decay_factor = 1.0 + (10 / 7) = 2.43
net_value = ($300 * 0.9 * 2.43) - $400 = $656.10 - $400 = $256.10
total_resource_cost = 420 + (120 * 3) = 780
priority_score = $256.10 / 780 = 0.33

Now it is positive. The task enters the queue.

At 2 days out:

time_decay_factor = 1.0 + (10 / 2) = 6.0
net_value = ($300 * 0.9 * 6.0) - $400 = $1,620 - $400 = $1,220
total_resource_cost = 780
priority_score = $1,220 / 780 = 1.56

At 2 days, this is the highest-priority task in the system. Deadline urgency causes tasks to bubble up naturally -- no special-case logic needed, no manual escalation. The formula handles it.

Why a Single Numeric Score Is Sufficient

At this scale -- dozens of active tasks, not thousands -- a single priority score followed by greedy resource-gated selection is both simple and near-optimal. The alternatives and why they are unnecessary:

Approach When You Need It Why Not Here
Pareto frontier / multi-objective optimization Hundreds of tasks with genuinely incomparable value dimensions All your tasks share a common value unit (dollars) and a common constraint set (GPU + human time)
Constraint programming (OR-Tools, CPLEX) Complex interdependencies, dozens of constraints You have 2-3 binding constraints and simple linear dependencies
Reinforcement learning scheduler Non-stationary reward distributions, massive action spaces Your reward distribution updates weekly, not per-second
Priority queues with aging Tasks that must not starve indefinitely The time_decay_factor already handles this for deadline tasks; non-deadline tasks have stable priority

A sorted list and a for-loop is the correct implementation for this problem size.


4. The "Go Deeper" Strategy

The Utilization Problem

A single household generates approximately 2 hours per day of genuine GPU work on a device that runs 24/7. That is 8% utilization. Not 80-90%. The Orin Nano is massively compute-rich and task-poor.

This is the inverse of a cloud computing problem. In the cloud, you have too many tasks and not enough machines. Here, you have too much machine and not enough tasks. The solution is not to find more tasks -- it is to spend more compute on each task.

The Solution: Depth Scaling

Every task gets a depth parameter that controls how much inference to spend. Higher depth means more queries, more cross-referencing, more verification, more comprehensive output.

Depth Claims Example GPU Time Quality
1 (quick) 3 web queries, best email match, send notification 5 min Good enough for initial pass
5 (standard) 15 queries, cross-ref 3 databases, SMTP-verify email, score confidence 25 min Solid for outreach
10 (deep) 30+ queries, full web research, pre-fill claim forms, research state process in detail, build person dossier, draft custom guidance letter, identify related claims in other states 60 min Comprehensive -- ready to file

The insight is that depth 1 and depth 10 are the same task with the same runner code. The only difference is how many inference passes the system makes, how many sources it cross-references, and how polished the output is.

Scheduling Rule

During peak periods (deadline tasks, new batch to process), run at depth=1. Get through the queue fast, capture the low-hanging fruit. During idle time, re-run completed tasks at higher depth. The scheduler automatically fills idle GPU time with deeper passes on work that has already been done at a shallower level.

if no_eligible_tasks and gpu_idle:
    pick task with lowest (depth_current / depth_max) ratio
    re-run at depth_current + 1
    label triggered_by = 'idle_deepening'

Impact on Utilization

Mode GPU hrs/day Utilization What It Does
Quick passes only 2 8% Process queue at depth=1, then idle
Standard depth 5 21% Re-process at depth=5 during downtime
Deep passes + speculative scanning 10 42% Full depth=10 re-processing, plus scanning adjacent states
Deep + speculative + knowledge building 14 58% All of the above, plus building enriched profiles and pre-computing templates

Speculative Work During Idle

When the priority queue is empty and all tasks have been deepened to their max, the scheduler can run speculative work:

  • Scan adjacent states for existing claim holders (if you found someone in CA, check TX, FL, NY)
  • Pre-compute response templates for all pending conversations (so outreach emails are ready to send instantly when a cron fires)
  • Build enriched person profiles for every record in the database (LinkedIn, public records, social presence)
  • Monitor state portals for newly posted claims (the sweep runners already do this at depth=1; deeper passes check more thoroughly)
  • Cross-reference the full source catalog for additional opportunities per person (if someone has an unclaimed refund in CA, do they also have unclaimed wages from DOL, or a class action settlement?)

This transforms the Jetson from an on-demand processor into a continuously working research assistant. Utilization climbs not because you invented new tasks, but because existing tasks become richer.


5. Task Catalog Design

Schema (SQLite)

CREATE TABLE task_catalog (
  id TEXT PRIMARY KEY,              -- 'claims.ca.fetch', 'tax.prep.2026', 'email.daily'
  category TEXT NOT NULL,           -- 'claims', 'tax', 'health', 'email', 'finance', 'forms', 'phone'
  label TEXT NOT NULL,              -- Human-readable name
  description TEXT,

  -- Value model
  expected_value_dollars REAL,
  probability_of_success REAL DEFAULT 1.0,
  value_notes TEXT,

  -- Resource requirements
  gpu_minutes_estimate REAL DEFAULT 0,
  cpu_minutes_estimate REAL DEFAULT 0,
  human_minutes_estimate REAL DEFAULT 0,

  -- Scheduling
  frequency TEXT DEFAULT 'once',    -- 'once', 'daily', 'weekly', 'monthly', 'quarterly', 'annual', 'event'
  cron_expression TEXT,
  deadline_date TEXT,
  urgency_boost REAL DEFAULT 0,

  -- Depth
  depth_current INTEGER DEFAULT 1,
  depth_max INTEGER DEFAULT 10,

  -- Automation
  automation_status TEXT DEFAULT 'manual_only',  -- 'runnable', 'repair_needed', 'manual_only', 'blocked'
  queue_enabled INTEGER DEFAULT 0,
  runner TEXT,                       -- Script/module path
  runner_args TEXT,                  -- JSON

  -- Dependencies
  depends_on TEXT,                   -- JSON array of task IDs

  -- User override
  user_priority_override REAL,
  user_paused INTEGER DEFAULT 0,

  created_at TEXT DEFAULT (datetime('now')),
  updated_at TEXT DEFAULT (datetime('now'))
);

CREATE TABLE task_runs (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_id TEXT NOT NULL REFERENCES task_catalog(id),
  status TEXT DEFAULT 'pending',     -- 'pending', 'running', 'completed', 'failed', 'skipped'
  depth INTEGER DEFAULT 1,
  started_at TEXT,
  completed_at TEXT,
  actual_gpu_minutes REAL,
  actual_human_minutes REAL,
  actual_value_realized REAL,
  computed_priority REAL,
  triggered_by TEXT,                 -- 'schedule', 'deadline', 'user', 'dependency', 'idle_deepening'
  error_message TEXT,
  result_json TEXT
);

CREATE TABLE task_schedule (
  task_id TEXT NOT NULL PRIMARY KEY REFERENCES task_catalog(id),
  next_run_at TEXT NOT NULL,
  last_run_at TEXT,
  last_status TEXT,
  consecutive_failures INTEGER DEFAULT 0
);

How This Generalizes Existing Code

The Dollar Hound codebase already has two scheduling primitives that map directly to this schema.

state-queue.js SOURCE_RUNTIME already defines per-source metadata:

// Existing pattern in state-queue.js SOURCE_RUNTIME
ca: {
  automation_status: 'runnable',      // -> task_catalog.automation_status
  queue_enabled: true,                // -> task_catalog.queue_enabled
  runner: 'remote_fetcher',           // -> task_catalog.runner
  source_priority: 40,               // -> replaced by computed WSJF score
  estimated_staging_gb: 900,          // -> subsumed by resource model (gpu_minutes, etc.)
  preferred_hosts: ['spark', ...],    // -> resource model host selection
}

The automation_status enum (runnable, repair_needed, manual_only, blocked) is preserved verbatim. The source_priority static number is replaced by the dynamic WSJF computation -- a task's priority changes based on deadline proximity, value estimates, and resource availability rather than being hardcoded.

claims-queue.js buildClaimsStateQueue() groups sources by state, sorts by population, and assigns hosts based on disk budget. In the generalized model:

  • Population-based sorting is replaced by WSJF scoring (population was always a proxy for expected value anyway -- bigger state = more potential claims)
  • queue_order becomes computed_priority
  • selected_host computed by selectHostForState() becomes the resource gate function checking GPU + human budgets instead of disk-only
  • The decorateStatePlan() enrichment step maps to computing priority and attaching runtime metadata at scheduling time

The generalization is straightforward because the existing code already separates catalog (what tasks exist) from scheduling (what order to run them) from execution (the runner). The task_catalog table is the catalog, the WSJF formula is the scheduler, and the existing runner scripts are the executors.


6. Resource Model

Budget Table

Resource Unit Daily Budget (Orin Nano) Notes
GPU compute minutes 1,440 (24h) Target 80-90% = 1,150-1,300 usable
CPU compute minutes 1,440 Usually not the bottleneck
RAM GB 8 Constrains model size; shared between inference and data processing
Disk staging GB ~500 usable Existing pattern from state-queue estimated_staging_gb
Network MB/hour Varies Rate-limited by upstream APIs and portal throttling
Human attention minutes/day 60-120 The truly scarce resource

GPU and CPU minutes are renewable every day. Human attention is the hard constraint -- you might have 60 minutes on a busy day, 120 on a quiet one, and 0 when you are traveling. The scheduler must gracefully degrade when human attention hits zero, continuing to run fully autonomous tasks.

The Resource Gate Function

function canRun(task, currentUsage, budget) {
  const targetUtil = 0.9; // leave 10% headroom

  if (currentUsage.gpu + task.estimate.gpu > budget.gpu * targetUtil)
    return false;
  if (currentUsage.cpu + task.estimate.cpu > budget.cpu * targetUtil)
    return false;
  if (currentUsage.human + task.estimate.human > budget.human * targetUtil)
    return false;

  return true;
}

This is the same pattern as assignHostsToQueue() in state-queue.js, which checks disk budget before assigning a host:

// Existing: state-queue.js line 341
if (free - (job.estimated_staging_gb || 0) >= minFreeGb) {
  assignedHost = host;
  budgets.set(host, free - (job.estimated_staging_gb || 0));
  break;
}

The generalized version replaces the single disk constraint with three constraints (GPU, CPU, human) and replaces per-host budgets with a single-machine budget. The logic is identical: check headroom, deduct from budget, allow or skip.

Critical behavior: If the top-priority task cannot run because human attention is exhausted, the scheduler skips it and tries the next task. This naturally fills GPU time with zero-human tasks when the operator is unavailable. A fully autonomous monitoring task with priority_score=3.2 will run ahead of a human-requiring task with priority_score=5.0 if the human budget is spent.


7. Queue Architecture

Single Priority Queue with Resource Gates

Not multiple queues. Not a priority queue per category. Not separate queues for autonomous vs. human-required tasks.

One queue, sorted by computed_priority DESC. The scheduler dequeues from the top, checks resource gates before launching. If a task cannot run (resource constraint or dependency not met), it is skipped -- not removed -- and the scheduler tries the next one.

Scheduling Cycle (Every 5 Minutes)

schedulerTick():

  1. RESTOCK — Compute next_run_at for all recurring tasks.
     Any task where next_run_at < now() becomes eligible.

  2. SCORE — Compute priority_score for each eligible task:
     - Fetch current estimates from task_catalog
     - Apply time_decay_factor for deadline tasks
     - Compute net_value and total_resource_cost
     - priority_score = net_value / total_resource_cost

  3. OVERRIDE — Apply user controls:
     - If user_priority_override is set, replace computed score
     - If user_paused = 1, remove from queue entirely

  4. SORT — Order by priority_score DESC

  5. DISPATCH — Walk the sorted list:
     For each task:
       - Check dependencies (depends_on tasks must be completed)
       - Check resource gates (canRun())
       - If both pass: launch the runner, record in task_runs
       - If either fails: skip, try next task
       - Stop dispatching when resource budget is 90% consumed

  6. DEEPEN — If nothing was dispatched and GPU is idle:
     - Find the completed task with lowest (depth_current / depth_max)
     - Re-run at depth_current + 1
     - Record triggered_by = 'idle_deepening'

Queue Walkthrough (ASCII)

Here is a concrete example of one scheduling cycle with five eligible tasks:

QUEUE STATE AT TICK (sorted by priority_score):
┌────┬─────────────────────────────┬──────────┬─────────┬────────┬────────┐
│ #  │ Task                        │ Priority │ GPU min │ Human  │ Status │
├────┼─────────────────────────────┼──────────┼─────────┼────────┼────────┤
│ 1  │ tax.prep.2026 (2 days out)  │ 1.56     │ 420     │ 120    │        │
│ 2  │ claims.enrich.batch         │ 3.20     │ 5       │ 0      │        │
│ 3  │ email.triage.daily          │ 2.80     │ 18      │ 0      │        │
│ 4  │ claims.outreach.send        │ 0.45     │ 2       │ 5      │        │
│ 5  │ health.billing.review       │ 0.33     │ 6       │ 30     │        │
└────┴─────────────────────────────┴──────────┴─────────┴────────┴────────┘

RESOURCE BUDGETS:
  GPU: 1,300 min remaining    Human: 90 min remaining

DISPATCH WALK:

  #1 tax.prep.2026
     GPU: 420 <= 1300? YES    Human: 120 <= 90? NO
     -> SKIP (human budget exceeded)

  #2 claims.enrich.batch
     GPU: 5 <= 1300? YES      Human: 0 <= 90? YES
     -> LAUNCH
     GPU remaining: 1295      Human remaining: 90

  #3 email.triage.daily
     GPU: 18 <= 1295? YES     Human: 0 <= 90? YES
     -> LAUNCH
     GPU remaining: 1277      Human remaining: 90

  #4 claims.outreach.send
     GPU: 2 <= 1277? YES      Human: 5 <= 90? YES
     -> LAUNCH
     GPU remaining: 1275      Human remaining: 85

  #5 health.billing.review
     GPU: 6 <= 1275? YES      Human: 30 <= 85? YES
     -> LAUNCH
     GPU remaining: 1269      Human remaining: 55

RESULT:
  Launched: #2, #3, #4, #5
  Skipped:  #1 (human budget -- will retry next tick when budget resets
            or when Joe marks 120 min available)
  GPU idle: 1269 min -> idle_deepening will fill this

The tax prep task -- the highest urgency item -- was skipped because it requires 120 minutes of human attention and only 90 remain in the budget. The system does not wait for it. It runs everything else it can, filling the GPU with autonomous work. When Joe explicitly allocates time for tax prep (or the daily budget resets), it will be the first thing dispatched.

No Preemption Needed (V1)

Tasks at this scale run for minutes to hours, not days. There is no need to interrupt a running task to start a higher-priority one. If a deadline task becomes critical while a batch job is running, it will be dispatched at the next tick (5 minutes). This is fast enough for a personal system.

If preemption ever becomes necessary (unlikely), the pattern is: save a checkpoint in result_json, mark the run as status='preempted', and re-queue it. But do not build this until you need it.


8. How OpenClaw Maps to This

The OpenClaw integration research (~/dev/sync/research/openclaw-macroclaw-integration.md) already designed a task manager skill with autonomy levels (full, draft-and-ask, ask-first, human-only) and a cron-driven execution loop. The WSJF formula adds the economic ranking layer on top of that architecture.

Mapping Table

OpenClaw Component Role in This System
Cron jobs Replace the setInterval(schedulerTick, 300000) pattern. The morning review (7am), afternoon follow-up (2pm), and weekly deep review (Sunday 9am) crons from the existing research doc trigger schedulerTick() and dispatch tasks.
Skills The task executor. skills/task-scheduler/SKILL.md calls Dollar Hound's /api/scheduler/queue to get the next task, runs it, and reports completion. Each task category can have its own skill: skills/claims-outreach/, skills/tax-prep/, skills/email-triage/.
Memory (MEMORY.md + daily logs) Tracks which tasks tend to succeed or fail. After a week of runs, the system can recalibrate probability_of_success based on actual outcomes. "Claims outreach to people over 70 has a 25% response rate, not 15%" -- this updates the priority formula automatically.
Approval workflows The "stop and ask" pattern for tasks that need human input. A claims outreach draft is draft-and-ask: the agent prepares the email, sends it to Joe via WhatsApp, waits for YES/NO/modify. Human_minutes for the approval step is counted as 2-5 minutes, not the full task duration.
Multi-channel (WhatsApp, iMessage) Notify Joe when a high-priority task needs attention. "Tax prep is 2 days from deadline and needs 120 minutes of your time. When do you want to start?" The notification itself is a zero-GPU, zero-human-time task that the scheduler fires automatically.

The Autonomy Level Bridge

The OpenClaw research defined four autonomy levels. Here is how they interact with the WSJF formula:

Autonomy Level human_minutes Impact Example
full 0 (no human time) Monitoring scans, auto-enrichment, portal checks
draft-and-ask 2-5 min (review + approve) Outreach emails, follow-up messages
ask-first 10-30 min (discussion + decision) Health coordination, financial moves
human-only All task time is human time Physical tasks, in-person meetings

Tasks at full autonomy dominate the queue because their human_minutes = 0 makes net_value higher and total_resource_cost lower. This is the correct incentive: the system should prioritize work it can do without bothering you.


9. Migration Path from Existing Code

The existing Dollar Hound codebase already has scheduling primitives. The migration is not a rewrite -- it is a generalization.

Mapping Table

Current Code What It Does Generalized Equivalent
state-queue.js SOURCE_RUNTIME with source_priority, automation_status, queue_enabled, estimated_staging_gb, preferred_hosts Per-source runtime metadata and static priority task_catalog table with WSJF-computed dynamic priority
claims-queue.js buildClaimsStateQueue() with queue_order, selected_host Build sorted queue of states, assign hosts schedulerTick() steps 1-5: score, sort, dispatch with resource gates
run-claims-queue.js worker pool with cursor++ + Promise.all Execute queue items concurrently with N workers OpenClaw skill executor with concurrency controlled by resource gates
state-queue.js assignHostsToQueue() with disk budget checking Assign tasks to hosts based on available disk Resource gate function generalized from disk-only to GPU + human + disk
systemd/dollar-hound.service with Restart=on-failure, RestartSec=5 Process-level restart on crash task_schedule.consecutive_failures with exponential backoff: next_retry = now + min(5 * 2^failures, 3600) seconds
SOURCE_RUNTIME automation_status enum: runnable, repair_needed, manual_only, blocked_portal Track whether a source can be automated task_catalog.automation_status -- same enum, same semantics, broader scope
claims-queue.js population-based sort (proxy for value) Higher population = more potential claims = higher priority WSJF score replaces population proxy with explicit expected_value * probability
run-claims-queue.js --workers 8, --batch-size 100 Concurrency and throughput tuning Resource budget (GPU minutes, human minutes) replaces fixed worker count

What Stays the Same

The runner scripts (run-source-remote.js, the sweep runners, the enrichment pipeline) do not change. They are task executors. The scheduler changes which tasks to run and in what order, but the execution logic is unchanged. This is the same separation that run-claims-queue.js already enforces: it builds the queue, then calls runState(entry) for each item. The generalized scheduler builds a different queue (WSJF-sorted instead of population-sorted) but calls the same runners.


10. The Seed Catalog

Pre-populate task_catalog with known tasks and their economic profiles. These numbers are estimates -- the calibration loop (section 11) will refine them.

Task ID Category Label EV ($) Prob GPU min Human min Frequency Depth Max
claims.monitoring.weekly claims Weekly portal monitoring 200/find 0.10/person 42 0 weekly 10
claims.enrich.batch claims Batch record enrichment 20/match 0.30 5/record 0 daily 5
claims.outreach.send claims Send outreach emails 150 avg 0.15 2 5 (review) daily 3
claims.conversation.respond claims Respond to inbound replies 150 avg 0.60 5 15 event 5
tax.prep.annual tax Annual tax preparation 300 0.90 420 (burst) 120 annual 10
email.triage.daily email Daily email triage scan 5/find 0.10 18 0 daily 5
email.settlement.scan email Class action settlement scan 50 avg 0.05 3 0 daily 3
health.billing.review health Medical billing review 300 avg 0.40 6 30 event 10
forms.benefits.check forms Benefits eligibility check 500 avg 0.20 2 20 quarterly 5
phone.dispute.call phone Dispute phone call (via MacroClaw) 200 avg 0.50 18 45 event 3
subscription.audit finance Subscription audit & cancel 30/mo savings 0.50 3 5 monthly 3

Priority Scores at a Glance

Computing priority_score for each (using $200/hr rate, attention_weight=3):

Task net_value resource_cost priority_score Notes
claims.enrich.batch $6.00 5 3.20 Best ratio -- fully autonomous, cheap
email.triage.daily $0.50 18 0.03 Low EV per run, but zero human cost
email.settlement.scan $2.50 3 0.83 Good ratio, fully autonomous
claims.monitoring.weekly $20.00 42 0.48 Solid autonomous scanning
subscription.audit $15.00 - $16.67 = -$1.67 18 neg Barely negative -- reduce human time to 2 min and it flips
claims.outreach.send $22.50 - $16.67 = $5.83 17 0.34 Positive only because human time is short (5 min review)
claims.conversation.respond $90.00 - $50.00 = $40.00 50 0.80 High prob (they already responded) makes this worthwhile
health.billing.review $120.00 - $100.00 = $20.00 96 0.21 High EV but heavy human time
forms.benefits.check $100.00 - $66.67 = $33.33 62 0.54 Quarterly; good value if probability holds
phone.dispute.call $100.00 - $150.00 = -$50.00 153 neg Phone calls are expensive in human time -- only worth it for high-value disputes
tax.prep.annual varies by deadline 780 varies Negative at 30 days, dominant at 2 days

The pattern is clear: fully autonomous tasks with even modest expected value rank highest. Tasks requiring human time must have high expected value and high probability to justify the attention cost. This is exactly the right incentive structure -- it pushes you to automate more of each task's pipeline so human_minutes drops and the task becomes economically viable.


11. Calibration

After running for one week, compare estimated vs. actual across three dimensions.

What to Measure

Metric Source Calibration Action
GPU minutes (est vs. actual) task_runs.actual_gpu_minutes vs task_catalog.gpu_minutes_estimate Exponential moving average: new_est = 0.7 * old_est + 0.3 * actual
Human minutes (est vs. actual) task_runs.actual_human_minutes vs task_catalog.human_minutes_estimate Same EMA formula
Value realized task_runs.actual_value_realized vs task_catalog.expected_value_dollars * probability_of_success Update probability: if 10 runs yielded value 3 times, probability = 0.30
Failure rate task_schedule.consecutive_failures If a task fails 3+ times consecutively, set automation_status = 'repair_needed'

Calibration Query

SELECT
  tc.id,
  tc.label,
  tc.expected_value_dollars,
  tc.probability_of_success AS est_prob,
  COUNT(tr.id) AS total_runs,
  SUM(CASE WHEN tr.actual_value_realized > 0 THEN 1 ELSE 0 END) AS successful_runs,
  ROUND(
    CAST(SUM(CASE WHEN tr.actual_value_realized > 0 THEN 1 ELSE 0 END) AS REAL)
    / NULLIF(COUNT(tr.id), 0),
    2
  ) AS actual_prob,
  ROUND(AVG(tr.actual_gpu_minutes), 1) AS avg_gpu_min,
  tc.gpu_minutes_estimate AS est_gpu_min,
  ROUND(AVG(tr.actual_human_minutes), 1) AS avg_human_min,
  tc.human_minutes_estimate AS est_human_min
FROM task_catalog tc
JOIN task_runs tr ON tr.task_id = tc.id
WHERE tr.status = 'completed'
  AND tr.completed_at > datetime('now', '-7 days')
GROUP BY tc.id
ORDER BY total_runs DESC;

What Good Calibration Looks Like

After a month, estimates should converge:

  • GPU minutes: within 20% of actual (tasks are deterministic enough)
  • Human minutes: within 50% of actual (human behavior is variable, but trends emerge)
  • Probability: within 10 percentage points of actual success rate

If a task's actual probability is consistently much lower than estimated, the formula will naturally deprioritize it -- net_value drops, and the task sinks in the queue. No manual intervention needed. The system self-corrects.


12. What NOT to Build

The temptation with a scheduling system is to over-engineer it. Here is what to avoid and why.

Not BullMQ / Redis

BullMQ is a production job queue backed by Redis. It handles retries, rate limiting, prioritization, delayed jobs, and job lifecycle events. It is excellent software for web-scale applications processing millions of jobs.

You have dozens of jobs. SQLite + a 5-minute setInterval is sufficient. Adding Redis means another daemon to run, another service to monitor, another failure mode. The Orin Nano has 8GB of RAM -- Redis would consume memory that is better spent on inference.

Not Temporal / Airflow

Temporal and Apache Airflow are workflow orchestration engines for distributed systems. They manage DAGs of tasks across clusters of workers with durable execution guarantees. They are built for data pipelines processing terabytes across dozens of machines.

You have one machine, one pipeline, and tasks that can be described in a flat catalog. The depends_on JSON array in task_catalog handles simple dependencies (run enrichment before outreach). If you ever need a DAG, you have bigger problems than scheduling.

Not a Custom Framework

This is 200-300 lines of scheduling code on top of an existing SQLite database and OpenClaw's cron system. It is not a framework. It does not need a plugin system, a configuration DSL, or an abstract base class for task types. The task runners already exist (run-source-remote.js, the sweep scripts, the enrichment pipeline). The scheduler just decides which ones to call and in what order.

Not a Pareto Optimizer

Multi-objective optimization (Pareto frontier, NSGA-II) is needed when objectives are genuinely incomparable -- you cannot convert them to a single unit. In your system, everything converts to dollars: GPU time has a cost (electricity + depreciation), human time has an opportunity cost, and task outcomes have a dollar value. A single numeric score captures the tradeoff completely. Greedy selection from a sorted list is O(n) and optimal enough.


Summary

The problem of "what should my personal AI computer work on next" has been well-studied under different names across multiple fields. WSJF provides the economic prioritization. Rate-monotonic and EDF scheduling handle recurring tasks and deadlines. The knapsack problem frames capacity planning. Heterogeneous computing addresses multi-resource constraints.

The synthesis is a single formula that produces a priority score, a single queue sorted by that score, and a resource gate that ensures nothing launches without budget. Depth scaling fills idle time. Calibration keeps estimates honest. OpenClaw provides the execution runtime. The existing Dollar Hound code provides the runners.

Build the task_catalog table, implement schedulerTick(), wire it to OpenClaw cron, and start measuring. The formula will tell you what to automate next (reduce human_minutes on negative-priority tasks), what to invest in (raise probability_of_success on high-EV tasks), and what to stop doing (tasks that stay negative after automation).