"Task Scheduling with WSJF: Prioritizing Personal AI Work"

2026-03-24 [schedulingwsjfopenclawtask-priorityedge-computing]

Task Scheduling with WSJF: Prioritizing Personal AI Work

1. The Question

"Has somebody thought about this sort of problem before and there's a clear concept that encapsulates this?"

Yes. The answer is WSJF -- Weighted Shortest Job First, from lean product development and the Scaled Agile Framework (SAFe). It was formalized by Don Reinertsen in The Principles of Product Development Flow (2009) and later adopted wholesale by SAFe for backlog prioritization. The core idea is simple: always do the job with the highest economic return per unit of time consumed.

But your version is more nuanced than standard WSJF. Standard WSJF assumes the work is "free" once you decide to do it -- the only cost is duration. Your problem has a second cost axis: human attention. A task that runs autonomously on the Jetson for 60 minutes is fundamentally cheaper than a task that requires 60 minutes of your time, even if the GPU cost is identical. Your formula must deduct human time cost from the value, not just divide by it.

This document lays out the established concepts, derives the priority formula, designs the data model, and maps it to existing Dollar Hound code so the whole system can be built in 200-300 lines of scheduling logic on top of OpenClaw.

2. Established Concepts That Map to This Problem

Five well-studied frameworks each contribute a piece of the answer.

WSJF (Weighted Shortest Job First)

The primary framework. Originated in lean manufacturing, formalized by Reinertsen, adopted by SAFe.

Canonical formula:

WSJF = Cost of Delay / Job Duration

Where Cost of Delay is decomposed into three components:

Component	What It Measures	Example
User-Business Value	How much is this worth?	$150 average claim recovery
Time Criticality	Does value decay if we wait?	Tax deadline April 15
Risk Reduction / Opportunity Enablement	Does this unlock other work?	Building the email pipeline enables all outreach

Your modification: Replace "Cost of Delay" with "net value after human time cost." Standard WSJF treats job duration as the denominator -- longer jobs are penalized. Your version goes further: human time is not just a duration penalty, it is an economic deduction from value. An hour of your time has a dollar cost, and that cost comes straight off the top.

Key insight: Always do the highest value-to-duration ratio first, not the highest absolute value. A $20 task taking 5 minutes beats a $400 task taking 2 hours. This is counterintuitive -- the $400 task feels more important -- but the math is clear. If you do the $20/5min task first and then the $400/2hr task, you capture $420 in 2h05m. If you reverse the order, you still capture $420 but the $20 task was delayed 2 hours for no reason. Scale that across dozens of tasks and the ordering effect compounds.

Multi-Dimensional Knapsack Problem

For capacity planning. The classic knapsack asks: given a bag with weight limit W and items with weights and values, which items maximize total value? The multi-dimensional variant adds constraints:

X GPU-hours per day
Y human-attention-minutes per day
Z network-bandwidth-hours per day

Which combination of tasks maximizes total value without exceeding any constraint?

This is NP-hard in general, but with dozens of tasks (not thousands), a greedy approximation -- sort by WSJF score, pack in order, skip if a constraint is violated -- is provably within 50% of optimal (the classic greedy knapsack bound). In practice it is much closer to optimal because the tasks are heterogeneous enough that greedy packing wastes very little capacity.

Rate-Monotonic Scheduling

From real-time operating systems (Liu & Layland, 1973). The rule: tasks with shorter periods get higher baseline priority than tasks with longer periods.

Applied here: a daily email triage (period = 1 day) gets higher baseline priority than a quarterly benefits check (period = 90 days). This is already implicit in WSJF -- shorter-period tasks tend to have higher time criticality -- but it is worth calling out as a scheduling principle because it guarantees that high-frequency monitoring tasks never get starved by long-running batch jobs.

Earliest Deadline First (EDF)

Also from real-time systems. The rule: always run the task whose absolute deadline is soonest. If you can meet all deadlines, EDF will find the schedule. If you cannot, EDF will miss the fewest.

Applied here: tax deadline April 15, claims portal filing window closes, insurance enrollment period ends. These tasks must bubble up regardless of their WSJF score as the deadline approaches. The time_decay_factor in the formula below implements this -- urgency amplifies value as the deadline nears.

Heterogeneous Computing Task Scheduling

From grid and cloud computing research (HEFT algorithm, Topcuoglu et al., 2002). The problem: scheduling tasks across machines with different capabilities (GPU vs CPU), different costs, and different speeds.

Used in Kubernetes, Apache Mesos, and YARN. But those systems handle thousands of tasks on hundreds of machines. Your problem is dramatically simpler: dozens of task types, one primary machine (Orin Nano), two scarce resources (GPU time and human attention). A full heterogeneous scheduler is overkill. The concepts transfer -- resource-aware task placement, capability matching, spillover to secondary hosts -- but a 200-line implementation suffices where Kubernetes needs 2 million lines.

The existing state-queue.js assignHostsToQueue() function already implements a simplified version of this: it walks preferred hosts, checks disk budget, and assigns the first host with capacity. The generalized scheduler extends this pattern from disk-only to GPU + human attention.

3. The Priority Formula

Modified WSJF with Net Value

priority_score = net_value / total_resource_cost

where:
  net_value = (expected_value * probability_of_success * time_decay_factor)
              - (human_minutes * hourly_rate / 60)

  total_resource_cost = gpu_minutes + cpu_minutes + (human_minutes * attention_weight)

  time_decay_factor = 1.0 + (urgency_boost / days_until_deadline)  [for deadline tasks]
                    = 1.0  [for non-deadline tasks]

Parameter definitions:

Parameter	Meaning	Typical Value
`expected_value`	Dollar value if the task succeeds	$20 - $500
`probability_of_success`	Likelihood of realizing that value	0.05 - 0.9
`time_decay_factor`	Multiplier that increases as deadlines approach	1.0 - 6.0
`human_minutes`	Minutes of your attention required	0 - 120
`hourly_rate`	Opportunity cost of your time	$200/hr
`gpu_minutes`	GPU compute time	2 - 420
`cpu_minutes`	CPU compute time (usually not the bottleneck)	0 - 60
`attention_weight`	How much harder human minutes are than GPU minutes	3 (i.e., 1 human minute = 3 GPU minutes in the denominator)
`urgency_boost`	How aggressively deadlines amplify priority	10 (tunable)

Worked Examples

Task A: California unclaimed property sweep

A full manual sweep of CA records -- downloading bulk data, matching against target names, enriching results, reviewing matches.

expected_value       = $400
probability          = 0.3
time_decay_factor    = 1.0 (no deadline)
human_minutes        = 50
hourly_rate          = $200/hr
gpu_minutes          = 60
cpu_minutes          = 0

net_value = ($400 * 0.3 * 1.0) - (50 * $200 / 60)
          = $120 - $166.67
          = -$46.67

priority_score = NEGATIVE -- do not run

This task is net-negative at $200/hr human time. The math is telling you something important: either the probability needs to be higher (better matching), or human time must be reduced through automation. If you can cut human_minutes from 50 to 10 (by automating the review step), the equation flips:

net_value = $120 - (10 * $200 / 60) = $120 - $33.33 = $86.67
total_resource_cost = 60 + 0 + (10 * 3) = 90
priority_score = $86.67 / 90 = 0.96

Automation turns a net-negative task into a worthwhile one. The formula quantifies exactly how much automation you need to justify the work.

Task B: Auto-monitoring for new claims (fully autonomous)

A background scan that checks portals for new claims matching known targets. Zero human involvement.

expected_value       = $20
probability          = 0.8
time_decay_factor    = 1.0
human_minutes        = 0
hourly_rate          = $200/hr
gpu_minutes          = 5
cpu_minutes          = 0

net_value = ($20 * 0.8 * 1.0) - (0 * $200 / 60) = $16 - $0 = $16
total_resource_cost  = 5 + 0 + (0 * 3) = 5
priority_score       = $16 / 5 = 3.20

Small value, but excellent ratio. Fully autonomous tasks with even modest expected value dominate the priority queue because they cost zero human time. The system will naturally fill idle GPU hours with these.

Task C: Tax prep with deadline approaching

Tax filing is high-value but human-intensive. Watch how the deadline changes the math.

At 30 days out:

expected_value       = $300
probability          = 0.9
urgency_boost        = 10
days_until_deadline  = 30
time_decay_factor    = 1.0 + (10 / 30) = 1.33
human_minutes        = 120
gpu_minutes          = 420  (7 hours of document processing, form analysis)

net_value = ($300 * 0.9 * 1.33) - (120 * $200 / 60)
          = $359.10 - $400.00
          = -$40.90

priority_score = NEGATIVE at 30 days

At 30 days, the formula says: not yet. Other tasks have better ROI. But watch what happens as the deadline closes in.

At 7 days out:

time_decay_factor = 1.0 + (10 / 7) = 2.43
net_value = ($300 * 0.9 * 2.43) - $400 = $656.10 - $400 = $256.10
total_resource_cost = 420 + (120 * 3) = 780
priority_score = $256.10 / 780 = 0.33

Now it is positive. The task enters the queue.

At 2 days out:

time_decay_factor = 1.0 + (10 / 2) = 6.0
net_value = ($300 * 0.9 * 6.0) - $400 = $1,620 - $400 = $1,220
total_resource_cost = 780
priority_score = $1,220 / 780 = 1.56

At 2 days, this is the highest-priority task in the system. Deadline urgency causes tasks to bubble up naturally -- no special-case logic needed, no manual escalation. The formula handles it.

Why a Single Numeric Score Is Sufficient

At this scale -- dozens of active tasks, not thousands -- a single priority score followed by greedy resource-gated selection is both simple and near-optimal. The alternatives and why they are unnecessary:

Approach	When You Need It	Why Not Here
Pareto frontier / multi-objective optimization	Hundreds of tasks with genuinely incomparable value dimensions	All your tasks share a common value unit (dollars) and a common constraint set (GPU + human time)
Constraint programming (OR-Tools, CPLEX)	Complex interdependencies, dozens of constraints	You have 2-3 binding constraints and simple linear dependencies
Reinforcement learning scheduler	Non-stationary reward distributions, massive action spaces	Your reward distribution updates weekly, not per-second
Priority queues with aging	Tasks that must not starve indefinitely	The time_decay_factor already handles this for deadline tasks; non-deadline tasks have stable priority

A sorted list and a for-loop is the correct implementation for this problem size.

4. The "Go Deeper" Strategy

The Utilization Problem

A single household generates approximately 2 hours per day of genuine GPU work on a device that runs 24/7. That is 8% utilization. Not 80-90%. The Orin Nano is massively compute-rich and task-poor.

This is the inverse of a cloud computing problem. In the cloud, you have too many tasks and not enough machines. Here, you have too much machine and not enough tasks. The solution is not to find more tasks -- it is to spend more compute on each task.

The Solution: Depth Scaling

Every task gets a depth parameter that controls how much inference to spend. Higher depth means more queries, more cross-referencing, more verification, more comprehensive output.

Depth	Claims Example	GPU Time	Quality
1 (quick)	3 web queries, best email match, send notification	5 min	Good enough for initial pass
5 (standard)	15 queries, cross-ref 3 databases, SMTP-verify email, score confidence	25 min	Solid for outreach
10 (deep)	30+ queries, full web research, pre-fill claim forms, research state process in detail, build person dossier, draft custom guidance letter, identify related claims in other states	60 min	Comprehensive -- ready to file

The insight is that depth 1 and depth 10 are the same task with the same runner code. The only difference is how many inference passes the system makes, how many sources it cross-references, and how polished the output is.

Scheduling Rule

During peak periods (deadline tasks, new batch to process), run at depth=1. Get through the queue fast, capture the low-hanging fruit. During idle time, re-run completed tasks at higher depth. The scheduler automatically fills idle GPU time with deeper passes on work that has already been done at a shallower level.

if no_eligible_tasks and gpu_idle:
    pick task with lowest (depth_current / depth_max) ratio
    re-run at depth_current + 1
    label triggered_by = 'idle_deepening'

Impact on Utilization

Mode	GPU hrs/day	Utilization	What It Does
Quick passes only	2	8%	Process queue at depth=1, then idle
Standard depth	5	21%	Re-process at depth=5 during downtime
Deep passes + speculative scanning	10	42%	Full depth=10 re-processing, plus scanning adjacent states
Deep + speculative + knowledge building	14	58%	All of the above, plus building enriched profiles and pre-computing templates

Speculative Work During Idle

When the priority queue is empty and all tasks have been deepened to their max, the scheduler can run speculative work:

Scan adjacent states for existing claim holders (if you found someone in CA, check TX, FL, NY)
Pre-compute response templates for all pending conversations (so outreach emails are ready to send instantly when a cron fires)
Build enriched person profiles for every record in the database (LinkedIn, public records, social presence)
Monitor state portals for newly posted claims (the sweep runners already do this at depth=1; deeper passes check more thoroughly)
Cross-reference the full source catalog for additional opportunities per person (if someone has an unclaimed refund in CA, do they also have unclaimed wages from DOL, or a class action settlement?)

This transforms the Jetson from an on-demand processor into a continuously working research assistant. Utilization climbs not because you invented new tasks, but because existing tasks become richer.

5. Task Catalog Design

Schema (SQLite)

CREATE TABLE task_catalog (
  id TEXT PRIMARY KEY,              -- 'claims.ca.fetch', 'tax.prep.2026', 'email.daily'
  category TEXT NOT NULL,           -- 'claims', 'tax', 'health', 'email', 'finance', 'forms', 'phone'
  label TEXT NOT NULL,              -- Human-readable name
  description TEXT,

  -- Value model
  expected_value_dollars REAL,
  probability_of_success REAL DEFAULT 1.0,
  value_notes TEXT,

  -- Resource requirements
  gpu_minutes_estimate REAL DEFAULT 0,
  cpu_minutes_estimate REAL DEFAULT 0,
  human_minutes_estimate REAL DEFAULT 0,

  -- Scheduling
  frequency TEXT DEFAULT 'once',    -- 'once', 'daily', 'weekly', 'monthly', 'quarterly', 'annual', 'event'
  cron_expression TEXT,
  deadline_date TEXT,
  urgency_boost REAL DEFAULT 0,

  -- Depth
  depth_current INTEGER DEFAULT 1,
  depth_max INTEGER DEFAULT 10,

  -- Automation
  automation_status TEXT DEFAULT 'manual_only',  -- 'runnable', 'repair_needed', 'manual_only', 'blocked'
  queue_enabled INTEGER DEFAULT 0,
  runner TEXT,                       -- Script/module path
  runner_args TEXT,                  -- JSON

  -- Dependencies
  depends_on TEXT,                   -- JSON array of task IDs

  -- User override
  user_priority_override REAL,
  user_paused INTEGER DEFAULT 0,

  created_at TEXT DEFAULT (datetime('now')),
  updated_at TEXT DEFAULT (datetime('now'))
);

CREATE TABLE task_runs (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_id TEXT NOT NULL REFERENCES task_catalog(id),
  status TEXT DEFAULT 'pending',     -- 'pending', 'running', 'completed', 'failed', 'skipped'
  depth INTEGER DEFAULT 1,
  started_at TEXT,
  completed_at TEXT,
  actual_gpu_minutes REAL,
  actual_human_minutes REAL,
  actual_value_realized REAL,
  computed_priority REAL,
  triggered_by TEXT,                 -- 'schedule', 'deadline', 'user', 'dependency', 'idle_deepening'
  error_message TEXT,
  result_json TEXT
);

CREATE TABLE task_schedule (
  task_id TEXT NOT NULL PRIMARY KEY REFERENCES task_catalog(id),
  next_run_at TEXT NOT NULL,
  last_run_at TEXT,
  last_status TEXT,
  consecutive_failures INTEGER DEFAULT 0
);

How This Generalizes Existing Code

The Dollar Hound codebase already has two scheduling primitives that map directly to this schema.

state-queue.js SOURCE_RUNTIME already defines per-source metadata:

// Existing pattern in state-queue.js SOURCE_RUNTIME
ca: {
  automation_status: 'runnable',      // -> task_catalog.automation_status
  queue_enabled: true,                // -> task_catalog.queue_enabled
  runner: 'remote_fetcher',           // -> task_catalog.runner
  source_priority: 40,               // -> replaced by computed WSJF score
  estimated_staging_gb: 900,          // -> subsumed by resource model (gpu_minutes, etc.)
  preferred_hosts: ['spark', ...],    // -> resource model host selection
}

The automation_status enum (runnable, repair_needed, manual_only, blocked) is preserved verbatim. The source_priority static number is replaced by the dynamic WSJF computation -- a task's priority changes based on deadline proximity, value estimates, and resource availability rather than being hardcoded.

claims-queue.js buildClaimsStateQueue() groups sources by state, sorts by population, and assigns hosts based on disk budget. In the generalized model:

Population-based sorting is replaced by WSJF scoring (population was always a proxy for expected value anyway -- bigger state = more potential claims)
queue_order becomes computed_priority
selected_host computed by selectHostForState() becomes the resource gate function checking GPU + human budgets instead of disk-only
The decorateStatePlan() enrichment step maps to computing priority and attaching runtime metadata at scheduling time

The generalization is straightforward because the existing code already separates catalog (what tasks exist) from scheduling (what order to run them) from execution (the runner). The task_catalog table is the catalog, the WSJF formula is the scheduler, and the existing runner scripts are the executors.

6. Resource Model

Budget Table

Resource	Unit	Daily Budget (Orin Nano)	Notes
GPU compute	minutes	1,440 (24h)	Target 80-90% = 1,150-1,300 usable
CPU compute	minutes	1,440	Usually not the bottleneck
RAM	GB	8	Constrains model size; shared between inference and data processing
Disk staging	GB	~500 usable	Existing pattern from state-queue `estimated_staging_gb`
Network	MB/hour	Varies	Rate-limited by upstream APIs and portal throttling
Human attention	minutes/day	60-120	The truly scarce resource

GPU and CPU minutes are renewable every day. Human attention is the hard constraint -- you might have 60 minutes on a busy day, 120 on a quiet one, and 0 when you are traveling. The scheduler must gracefully degrade when human attention hits zero, continuing to run fully autonomous tasks.

The Resource Gate Function

function canRun(task, currentUsage, budget) {
  const targetUtil = 0.9; // leave 10% headroom

  if (currentUsage.gpu + task.estimate.gpu > budget.gpu * targetUtil)
    return false;
  if (currentUsage.cpu + task.estimate.cpu > budget.cpu * targetUtil)
    return false;
  if (currentUsage.human + task.estimate.human > budget.human * targetUtil)
    return false;

  return true;
}

This is the same pattern as assignHostsToQueue() in state-queue.js, which checks disk budget before assigning a host:

// Existing: state-queue.js line 341
if (free - (job.estimated_staging_gb || 0) >= minFreeGb) {
  assignedHost = host;
  budgets.set(host, free - (job.estimated_staging_gb || 0));
  break;
}

The generalized version replaces the single disk constraint with three constraints (GPU, CPU, human) and replaces per-host budgets with a single-machine budget. The logic is identical: check headroom, deduct from budget, allow or skip.

Critical behavior: If the top-priority task cannot run because human attention is exhausted, the scheduler skips it and tries the next task. This naturally fills GPU time with zero-human tasks when the operator is unavailable. A fully autonomous monitoring task with priority_score=3.2 will run ahead of a human-requiring task with priority_score=5.0 if the human budget is spent.

7. Queue Architecture

Single Priority Queue with Resource Gates

Not multiple queues. Not a priority queue per category. Not separate queues for autonomous vs. human-required tasks.

One queue, sorted by computed_priority DESC. The scheduler dequeues from the top, checks resource gates before launching. If a task cannot run (resource constraint or dependency not met), it is skipped -- not removed -- and the scheduler tries the next one.

Scheduling Cycle (Every 5 Minutes)

schedulerTick():

  1. RESTOCK — Compute next_run_at for all recurring tasks.
     Any task where next_run_at < now() becomes eligible.

  2. SCORE — Compute priority_score for each eligible task:
     - Fetch current estimates from task_catalog
     - Apply time_decay_factor for deadline tasks
     - Compute net_value and total_resource_cost
     - priority_score = net_value / total_resource_cost

  3. OVERRIDE — Apply user controls:
     - If user_priority_override is set, replace computed score
     - If user_paused = 1, remove from queue entirely

  4. SORT — Order by priority_score DESC

  5. DISPATCH — Walk the sorted list:
     For each task:
       - Check dependencies (depends_on tasks must be completed)
       - Check resource gates (canRun())
       - If both pass: launch the runner, record in task_runs
       - If either fails: skip, try next task
       - Stop dispatching when resource budget is 90% consumed

  6. DEEPEN — If nothing was dispatched and GPU is idle:
     - Find the completed task with lowest (depth_current / depth_max)
     - Re-run at depth_current + 1
     - Record triggered_by = 'idle_deepening'

Queue Walkthrough (ASCII)

Here is a concrete example of one scheduling cycle with five eligible tasks:

QUEUE STATE AT TICK (sorted by priority_score):
┌────┬─────────────────────────────┬──────────┬─────────┬────────┬────────┐
│ #  │ Task                        │ Priority │ GPU min │ Human  │ Status │
├────┼─────────────────────────────┼──────────┼─────────┼────────┼────────┤
│ 1  │ tax.prep.2026 (2 days out)  │ 1.56     │ 420     │ 120    │        │
│ 2  │ claims.enrich.batch         │ 3.20     │ 5       │ 0      │        │
│ 3  │ email.triage.daily          │ 2.80     │ 18      │ 0      │        │
│ 4  │ claims.outreach.send        │ 0.45     │ 2       │ 5      │        │
│ 5  │ health.billing.review       │ 0.33     │ 6       │ 30     │        │
└────┴─────────────────────────────┴──────────┴─────────┴────────┴────────┘

RESOURCE BUDGETS:
  GPU: 1,300 min remaining    Human: 90 min remaining

DISPATCH WALK:

  #1 tax.prep.2026
     GPU: 420 <= 1300? YES    Human: 120 <= 90? NO
     -> SKIP (human budget exceeded)

  #2 claims.enrich.batch
     GPU: 5 <= 1300? YES      Human: 0 <= 90? YES
     -> LAUNCH
     GPU remaining: 1295      Human remaining: 90

  #3 email.triage.daily
     GPU: 18 <= 1295? YES     Human: 0 <= 90? YES
     -> LAUNCH
     GPU remaining: 1277      Human remaining: 90

  #4 claims.outreach.send
     GPU: 2 <= 1277? YES      Human: 5 <= 90? YES
     -> LAUNCH
     GPU remaining: 1275      Human remaining: 85

  #5 health.billing.review
     GPU: 6 <= 1275? YES      Human: 30 <= 85? YES
     -> LAUNCH
     GPU remaining: 1269      Human remaining: 55

RESULT:
  Launched: #2, #3, #4, #5
  Skipped:  #1 (human budget -- will retry next tick when budget resets
            or when Joe marks 120 min available)
  GPU idle: 1269 min -> idle_deepening will fill this

The tax prep task -- the highest urgency item -- was skipped because it requires 120 minutes of human attention and only 90 remain in the budget. The system does not wait for it. It runs everything else it can, filling the GPU with autonomous work. When Joe explicitly allocates time for tax prep (or the daily budget resets), it will be the first thing dispatched.

No Preemption Needed (V1)

Tasks at this scale run for minutes to hours, not days. There is no need to interrupt a running task to start a higher-priority one. If a deadline task becomes critical while a batch job is running, it will be dispatched at the next tick (5 minutes). This is fast enough for a personal system.

If preemption ever becomes necessary (unlikely), the pattern is: save a checkpoint in result_json, mark the run as status='preempted', and re-queue it. But do not build this until you need it.

8. How OpenClaw Maps to This

The OpenClaw integration research (~/dev/sync/research/openclaw-macroclaw-integration.md) already designed a task manager skill with autonomy levels (full, draft-and-ask, ask-first, human-only) and a cron-driven execution loop. The WSJF formula adds the economic ranking layer on top of that architecture.

Mapping Table

OpenClaw Component	Role in This System
Cron jobs	Replace the `setInterval(schedulerTick, 300000)` pattern. The morning review (7am), afternoon follow-up (2pm), and weekly deep review (Sunday 9am) crons from the existing research doc trigger `schedulerTick()` and dispatch tasks.
Skills	The task executor. `skills/task-scheduler/SKILL.md` calls Dollar Hound's `/api/scheduler/queue` to get the next task, runs it, and reports completion. Each task category can have its own skill: `skills/claims-outreach/`, `skills/tax-prep/`, `skills/email-triage/`.
Memory (MEMORY.md + daily logs)	Tracks which tasks tend to succeed or fail. After a week of runs, the system can recalibrate `probability_of_success` based on actual outcomes. "Claims outreach to people over 70 has a 25% response rate, not 15%" -- this updates the priority formula automatically.
Approval workflows	The "stop and ask" pattern for tasks that need human input. A claims outreach draft is `draft-and-ask`: the agent prepares the email, sends it to Joe via WhatsApp, waits for YES/NO/modify. Human_minutes for the approval step is counted as 2-5 minutes, not the full task duration.
Multi-channel (WhatsApp, iMessage)	Notify Joe when a high-priority task needs attention. "Tax prep is 2 days from deadline and needs 120 minutes of your time. When do you want to start?" The notification itself is a zero-GPU, zero-human-time task that the scheduler fires automatically.

The Autonomy Level Bridge

The OpenClaw research defined four autonomy levels. Here is how they interact with the WSJF formula:

Autonomy Level	human_minutes Impact	Example
`full`	0 (no human time)	Monitoring scans, auto-enrichment, portal checks
`draft-and-ask`	2-5 min (review + approve)	Outreach emails, follow-up messages
`ask-first`	10-30 min (discussion + decision)	Health coordination, financial moves
`human-only`	All task time is human time	Physical tasks, in-person meetings

Tasks at full autonomy dominate the queue because their human_minutes = 0 makes net_value higher and total_resource_cost lower. This is the correct incentive: the system should prioritize work it can do without bothering you.

9. Migration Path from Existing Code

The existing Dollar Hound codebase already has scheduling primitives. The migration is not a rewrite -- it is a generalization.

Mapping Table

Current Code	What It Does	Generalized Equivalent
`state-queue.js` `SOURCE_RUNTIME` with `source_priority`, `automation_status`, `queue_enabled`, `estimated_staging_gb`, `preferred_hosts`	Per-source runtime metadata and static priority	`task_catalog` table with WSJF-computed dynamic priority
`claims-queue.js` `buildClaimsStateQueue()` with `queue_order`, `selected_host`	Build sorted queue of states, assign hosts	`schedulerTick()` steps 1-5: score, sort, dispatch with resource gates
`run-claims-queue.js` worker pool with `cursor++` + `Promise.all`	Execute queue items concurrently with N workers	OpenClaw skill executor with concurrency controlled by resource gates
`state-queue.js` `assignHostsToQueue()` with disk budget checking	Assign tasks to hosts based on available disk	Resource gate function generalized from disk-only to GPU + human + disk
`systemd/dollar-hound.service` with `Restart=on-failure`, `RestartSec=5`	Process-level restart on crash	`task_schedule.consecutive_failures` with exponential backoff: `next_retry = now + min(5 * 2^failures, 3600)` seconds
`SOURCE_RUNTIME` `automation_status` enum: `runnable`, `repair_needed`, `manual_only`, `blocked_portal`	Track whether a source can be automated	`task_catalog.automation_status` -- same enum, same semantics, broader scope
`claims-queue.js` population-based sort (proxy for value)	Higher population = more potential claims = higher priority	WSJF score replaces population proxy with explicit `expected_value * probability`
`run-claims-queue.js` `--workers 8`, `--batch-size 100`	Concurrency and throughput tuning	Resource budget (GPU minutes, human minutes) replaces fixed worker count

What Stays the Same

The runner scripts (run-source-remote.js, the sweep runners, the enrichment pipeline) do not change. They are task executors. The scheduler changes which tasks to run and in what order, but the execution logic is unchanged. This is the same separation that run-claims-queue.js already enforces: it builds the queue, then calls runState(entry) for each item. The generalized scheduler builds a different queue (WSJF-sorted instead of population-sorted) but calls the same runners.

10. The Seed Catalog

Pre-populate task_catalog with known tasks and their economic profiles. These numbers are estimates -- the calibration loop (section 11) will refine them.

Task ID	Category	Label	EV ($)	Prob	GPU min	Human min	Frequency	Depth Max
`claims.monitoring.weekly`	claims	Weekly portal monitoring	200/find	0.10/person	42	0	weekly	10
`claims.enrich.batch`	claims	Batch record enrichment	20/match	0.30	5/record	0	daily	5
`claims.outreach.send`	claims	Send outreach emails	150 avg	0.15	2	5 (review)	daily	3
`claims.conversation.respond`	claims	Respond to inbound replies	150 avg	0.60	5	15	event	5
`tax.prep.annual`	tax	Annual tax preparation	300	0.90	420 (burst)	120	annual	10
`email.triage.daily`	email	Daily email triage scan	5/find	0.10	18	0	daily	5
`email.settlement.scan`	email	Class action settlement scan	50 avg	0.05	3	0	daily	3
`health.billing.review`	health	Medical billing review	300 avg	0.40	6	30	event	10
`forms.benefits.check`	forms	Benefits eligibility check	500 avg	0.20	2	20	quarterly	5
`phone.dispute.call`	phone	Dispute phone call (via MacroClaw)	200 avg	0.50	18	45	event	3
`subscription.audit`	finance	Subscription audit & cancel	30/mo savings	0.50	3	5	monthly	3

Priority Scores at a Glance

Computing priority_score for each (using $200/hr rate, attention_weight=3):

Task	net_value	resource_cost	priority_score	Notes
`claims.enrich.batch`	$6.00	5	3.20	Best ratio -- fully autonomous, cheap
`email.triage.daily`	$0.50	18	0.03	Low EV per run, but zero human cost
`email.settlement.scan`	$2.50	3	0.83	Good ratio, fully autonomous
`claims.monitoring.weekly`	$20.00	42	0.48	Solid autonomous scanning
`subscription.audit`	$15.00 - $16.67 = -$1.67	18	neg	Barely negative -- reduce human time to 2 min and it flips
`claims.outreach.send`	$22.50 - $16.67 = $5.83	17	0.34	Positive only because human time is short (5 min review)
`claims.conversation.respond`	$90.00 - $50.00 = $40.00	50	0.80	High prob (they already responded) makes this worthwhile
`health.billing.review`	$120.00 - $100.00 = $20.00	96	0.21	High EV but heavy human time
`forms.benefits.check`	$100.00 - $66.67 = $33.33	62	0.54	Quarterly; good value if probability holds
`phone.dispute.call`	$100.00 - $150.00 = -$50.00	153	neg	Phone calls are expensive in human time -- only worth it for high-value disputes
`tax.prep.annual`	varies by deadline	780	varies	Negative at 30 days, dominant at 2 days

The pattern is clear: fully autonomous tasks with even modest expected value rank highest. Tasks requiring human time must have high expected value and high probability to justify the attention cost. This is exactly the right incentive structure -- it pushes you to automate more of each task's pipeline so human_minutes drops and the task becomes economically viable.

11. Calibration

After running for one week, compare estimated vs. actual across three dimensions.

What to Measure

Metric	Source	Calibration Action
GPU minutes (est vs. actual)	`task_runs.actual_gpu_minutes` vs `task_catalog.gpu_minutes_estimate`	Exponential moving average: `new_est = 0.7 * old_est + 0.3 * actual`
Human minutes (est vs. actual)	`task_runs.actual_human_minutes` vs `task_catalog.human_minutes_estimate`	Same EMA formula
Value realized	`task_runs.actual_value_realized` vs `task_catalog.expected_value_dollars * probability_of_success`	Update probability: if 10 runs yielded value 3 times, `probability = 0.30`
Failure rate	`task_schedule.consecutive_failures`	If a task fails 3+ times consecutively, set `automation_status = 'repair_needed'`

Calibration Query

SELECT
  tc.id,
  tc.label,
  tc.expected_value_dollars,
  tc.probability_of_success AS est_prob,
  COUNT(tr.id) AS total_runs,
  SUM(CASE WHEN tr.actual_value_realized > 0 THEN 1 ELSE 0 END) AS successful_runs,
  ROUND(
    CAST(SUM(CASE WHEN tr.actual_value_realized > 0 THEN 1 ELSE 0 END) AS REAL)
    / NULLIF(COUNT(tr.id), 0),
    2
  ) AS actual_prob,
  ROUND(AVG(tr.actual_gpu_minutes), 1) AS avg_gpu_min,
  tc.gpu_minutes_estimate AS est_gpu_min,
  ROUND(AVG(tr.actual_human_minutes), 1) AS avg_human_min,
  tc.human_minutes_estimate AS est_human_min
FROM task_catalog tc
JOIN task_runs tr ON tr.task_id = tc.id
WHERE tr.status = 'completed'
  AND tr.completed_at > datetime('now', '-7 days')
GROUP BY tc.id
ORDER BY total_runs DESC;

What Good Calibration Looks Like

After a month, estimates should converge:

GPU minutes: within 20% of actual (tasks are deterministic enough)
Human minutes: within 50% of actual (human behavior is variable, but trends emerge)
Probability: within 10 percentage points of actual success rate

If a task's actual probability is consistently much lower than estimated, the formula will naturally deprioritize it -- net_value drops, and the task sinks in the queue. No manual intervention needed. The system self-corrects.

12. What NOT to Build

The temptation with a scheduling system is to over-engineer it. Here is what to avoid and why.

Not BullMQ / Redis

BullMQ is a production job queue backed by Redis. It handles retries, rate limiting, prioritization, delayed jobs, and job lifecycle events. It is excellent software for web-scale applications processing millions of jobs.

You have dozens of jobs. SQLite + a 5-minute setInterval is sufficient. Adding Redis means another daemon to run, another service to monitor, another failure mode. The Orin Nano has 8GB of RAM -- Redis would consume memory that is better spent on inference.

Not Temporal / Airflow

Temporal and Apache Airflow are workflow orchestration engines for distributed systems. They manage DAGs of tasks across clusters of workers with durable execution guarantees. They are built for data pipelines processing terabytes across dozens of machines.

You have one machine, one pipeline, and tasks that can be described in a flat catalog. The depends_on JSON array in task_catalog handles simple dependencies (run enrichment before outreach). If you ever need a DAG, you have bigger problems than scheduling.

Not a Custom Framework

This is 200-300 lines of scheduling code on top of an existing SQLite database and OpenClaw's cron system. It is not a framework. It does not need a plugin system, a configuration DSL, or an abstract base class for task types. The task runners already exist (run-source-remote.js, the sweep scripts, the enrichment pipeline). The scheduler just decides which ones to call and in what order.

Not a Pareto Optimizer

Multi-objective optimization (Pareto frontier, NSGA-II) is needed when objectives are genuinely incomparable -- you cannot convert them to a single unit. In your system, everything converts to dollars: GPU time has a cost (electricity + depreciation), human time has an opportunity cost, and task outcomes have a dollar value. A single numeric score captures the tradeoff completely. Greedy selection from a sorted list is O(n) and optimal enough.

Summary

The problem of "what should my personal AI computer work on next" has been well-studied under different names across multiple fields. WSJF provides the economic prioritization. Rate-monotonic and EDF scheduling handle recurring tasks and deadlines. The knapsack problem frames capacity planning. Heterogeneous computing addresses multi-resource constraints.

The synthesis is a single formula that produces a priority score, a single queue sorted by that score, and a resource gate that ensures nothing launches without budget. Depth scaling fills idle time. Calibration keeps estimates honest. OpenClaw provides the execution runtime. The existing Dollar Hound code provides the runners.

Build the task_catalog table, implement schedulerTick(), wire it to OpenClaw cron, and start measuring. The formula will tell you what to automate next (reduce human_minutes on negative-priority tasks), what to invest in (raise probability_of_success on high-EV tasks), and what to stop doing (tasks that stay negative after automation).