Feedback-Driven News Personalization

From upvotes to fine-tuned per-person news models

2026-02-17 aifine-tuningpersonalizationnewsrlhfdpo

The Goal

Build a system where readers can upvote/downvote articles (with optional comments), and use that feedback to generate better news for them over time. Two phases:

Prompt augmentation (works immediately, any number of users): Store feedback as text, inject it into the generation prompt each week
Model fine-tuning (requires scale): Train per-user or per-segment LoRA adapters when enough feedback accumulates

This document covers system design, scale thresholds, costs, and a practical implementation path.

Phase 1: Prompt Augmentation (0–100 signals per user)

How It Works

Every time a user upvotes or downvotes an article, store it in a simple feedback file:

{"action": "up", "section": "Tech", "headline": "OpenAI Releases GPT-5", "reason": null, "ts": "2026-02-17"}
{"action": "down", "section": "Tech", "headline": "AI Models for Developer Tools", "reason": "too generic, not actual news", "ts": "2026-02-17"}
{"action": "up", "section": "Local News", "headline": "Palo Alto Council Approves ADU Reform", "reason": "love hyperlocal policy stuff", "ts": "2026-02-17"}

Before generating the next edition, synthesize this into a preference profile injected into the system prompt:

## Reader Preferences (from feedback)

LIKES: Hyperlocal policy news, specific company earnings/deals, breaking announcements
DISLIKES: Generic topic overviews, listicles without news hooks, repetitive coverage

Recent upvoted headlines:
- "Palo Alto Council Approves ADU Reform" (reason: "love hyperlocal policy stuff")
- "OpenAI Releases GPT-5"

Recent downvoted headlines:
- "AI Models for Developer Tools" (reason: "too generic, not actual news")

Use this to calibrate tone, depth, and story selection. Prioritize the kinds
of stories this reader engages with. Avoid patterns they've downvoted.

Why This Works

Research confirms in-context preference learning is effective for content recommendation:

RecPrompt (2023) showed iteratively optimized prompts with user reading history significantly outperform collaborative filtering baselines in cold-start scenarios
LLM-Rec (NAACL 2024) demonstrated prompting strategies augmenting user/item info outperform direct recommendations, especially with sparse data
Cold-start research consistently shows LLMs are "competitive near cold-start recommenders" — with only a few interactions, a well-prompted LLM matches or beats trained models

Capacity

With a 128K context window (Claude Sonnet, GPT-4o): - Reserve ~6K tokens for system prompt + output - At ~100 tokens per liked/disliked article (headline + summary + reason): ~1,200 preference examples fit - Practically, you want 10–50 recent examples plus a synthesized preference summary (~200–500 tokens)

When to Use This

User interactions	Approach	Why
0–5	No personalization	Not enough signal
5–20	Inject raw feedback into prompt	Every signal matters, let the LLM interpret
20–100	Synthesized preference profile + recent examples	Structured profile reduces noise, recent examples add specificity

Cost

Zero additional cost — the feedback adds ~200–500 tokens to the generation prompt. At Claude Sonnet pricing ($3/M input tokens), that's $0.0006–$0.0015 per generation. Negligible.

Phase 2: Fine-Tuning (100+ signals per user)

When Fine-Tuning Beats Prompting

Research on PEFT (LoRA) vs. RAG/prompt injection shows a clear crossover:

Below ~50 interactions per user: Prompt augmentation wins. Not enough data to train effectively.
50–100 interactions: Hybrid approaches start competing. A lightweight adapter plus prompt context can outperform either alone.
100+ interactions: Per-user LoRA adapters become meaningfully better than prompting. The model internalizes preferences rather than re-reading them.

For a weekly newspaper with ~10 sections per edition, a user who reads every week and provides feedback on ~5 articles per edition would hit 100 interactions in ~20 weeks (5 months). Power users giving more feedback would get there faster.

What to Fine-Tune On

Two approaches, depending on signal type:

Supervised Fine-Tuning (SFT) — for upvoted articles:

{"messages": [{"role": "system", "content": "Generate a weekly news article..."}, {"role": "user", "content": "Write this week's top tech news"}, {"role": "assistant", "content": "[the upvoted article text]"}]}

Direct Preference Optimization (DPO) — for up/down pairs:

{"prompt": "Write this week's AI news roundup", "chosen": "[upvoted article]", "rejected": "[downvoted article]"}

DPO is more powerful because it explicitly teaches the model "this, not that." OpenAI now supports DPO fine-tuning natively. For open-source models, Hugging Face TRL supports DPO training.

Minimum for DPO: ~200–500 high-quality preference pairs. Below 100 pairs, overfitting risk is high and signal is noisy.

Cost Estimates

Method	Model	Dataset Size	Training Cost	Notes
OpenAI SFT	GPT-4o-mini	100 examples (~100K tokens, 3 epochs)	~$0.90	Cheapest managed option
OpenAI DPO	GPT-4o-mini	200 pairs (~200K tokens, 3 epochs)	~$1.80	Best for preference learning
OpenAI SFT	GPT-4o	100 examples	~$7.50	Premium quality
LoRA (cloud GPU)	Llama 3 8B	500 examples	$2–10	RTX 4090 rental, 3–4 hours
LoRA (cloud GPU)	Mistral 7B	500 examples	$2–10	Same as above
Together AI managed	Llama 3 8B	500 examples	$5–20	No GPU management needed

Per-user fine-tuning at these prices is feasible. 1,000 users x $2/tune = $2,000 per training cycle. Run monthly or quarterly.

Serving Fine-Tuned Models

LoRA adapter storage: Each adapter is ~2–20 MB for a 7B model. 1,000 user adapters = 2–20 GB total. Manageable.

Serving options: - Together AI Serverless Multi-LoRA: Train adapters offline, serve at base-model prices. They handle adapter swapping. Best managed option. - Self-hosted (Jetson or cloud): The Jetson Orin Nano can serve a base 7B model + swap LoRA adapters for inference (EdgeLoRA demonstrated this). It cannot train — only serve. Train on cloud, deploy adapters to Jetson. - vLLM Multi-LoRA: If self-hosting on a proper GPU server, vLLM supports efficient multi-adapter serving.

Phase 3: Per-Person Custom Models (500+ signals, at scale)

The Vision

At sufficient scale (thousands of users, hundreds of signals each), you could:

Cluster users into segments based on feedback patterns (e.g., "deep-dive policy readers" vs. "headline scanners" vs. "contrarian perspective seekers")
Train segment-level adapters rather than per-user (more data per adapter = better quality)
Route users to the nearest segment adapter plus their personal preference prompt
As individual users accumulate enough data, graduate them to a personal adapter

When This Makes Sense

Scale	Approach	Training Frequency
1–50 users	Prompt augmentation only	N/A (real-time)
50–500 users	Prompt augmentation + 3–5 segment adapters	Monthly
500–5,000 users	Segment adapters (10–20 segments) + personal adapters for power users	Bi-weekly
5,000+ users	Per-user adapters for active users, segment fallback for others	Weekly

The Economics

At 5,000 users with monthly adapter retraining: - ~500 active users get personal adapters: 500 x $2 = $1,000/month - 20 segment adapters for the rest: 20 x $5 = $100/month - Total: ~$1,100/month in training compute - Serving: base model + adapter swapping adds minimal marginal cost

This becomes unit-economical if the subscription revenue per user exceeds ~$0.25/month in training costs.

Implementation Plan for Modern Newspaper

Week 1: Add Feedback UI + Storage

Add upvote/downvote chevrons to each article section in the newspaper template. Clicking opens an optional comment field.

Storage: Append feedback to a JSONL file per subscriber:

subscribers/{hash}/feedback.jsonl

Each line:

{"action": "up|down", "section_id": "tech-daily", "headline": "...", "excerpt": "first 200 chars...", "reason": "optional comment", "edition": 3, "ts": "2026-02-17T10:15:00Z"}

API endpoint: POST /api/feedback with {subscriber_hash, section_id, action, reason?}

Week 2: Inject Feedback into Generation

Modify task_runner.py to: 1. Before generating content for a subscriber section, load their feedback.jsonl 2. Filter to feedback relevant to that section 3. Synthesize a preference summary 4. Prepend it to the task body sent to Claude

This is the prompt augmentation approach — zero training cost, immediate effect.

Month 2–3: Build Feedback Dashboard

Simple admin view showing: - Total feedback collected per user - Top liked/disliked patterns - Users approaching the fine-tuning threshold (100+ signals)

Month 4+: Evaluate Fine-Tuning

When any user crosses 100+ feedback signals: 1. Export their feedback as SFT/DPO training data 2. Run a LoRA fine-tune on GPT-4o-mini ($1–2) or Llama 3 8B ($2–10) 3. A/B test: generate one edition with fine-tuned model, one with prompt augmentation 4. Compare engagement (click-through, time on page, further upvotes) 5. If fine-tuned wins, deploy for that user

Key Takeaways

Start with prompt augmentation. It's free and works immediately. Even 5 upvotes/downvotes provide useful signal when injected into the generation prompt.
Don't fine-tune until you have 100+ signals per user. Below that threshold, prompting is actually better because fine-tuning overfits on sparse data.
DPO > SFT for preference learning. If you have both upvoted and downvoted articles from the same user, DPO is the right training approach. OpenAI supports it natively; open-source models support it via TRL.
Per-user fine-tuning is economically feasible at $2–10 per user per training cycle. The bottleneck is data, not cost.
The Jetson can serve adapters but not train them. Train on cloud GPUs or managed services, deploy adapters to the Jetson for inference.
User segments are more practical than pure per-user models until you have hundreds of signals per user. Cluster first, personalize within clusters.