← back to work
2026 · Product direction & decisions; AI agent (Claude) as builder

Lin — AI Job-Search Agent

A use case everyone understands, built end-to-end with an AI agent — then rebuilt right.

ai-agentsproduct-managementopen-sourceautomation
Lin — AI Job-Search Agent interface
~$0.30/day
// routine pipeline cost (budget models)
−90%
// context overhead per run (21k → 2k tokens)
60
// automated tests (incl. real-browser)
stack: Hermes agent platformNode.jsMulti-model LLMsPlaywrightCloudflare Pages

Why this build

I wanted deep, hands-on experience engineering with AI agents — so I picked a use case that’s real, widely felt, and instantly recognizable: the modern job search. It exercises everything that matters in agent products (multi-step pipelines, model economics, deterministic guardrails, a human firmly in the loop), and the result is genuinely useful to people in an active search — which is why it’s open source.

The product

A job search is an operations problem run at emotional expense. Lin automates the operations: it scans portals twice daily, scores every role with a structured 0–5 evaluation plus a configurable geo-eligibility gate, verifies a posting is still live, builds two competing tailored résumés — one engine optimizes narrative for humans, the other injects keywords for ATS robots — picks the winner by automated comparison, drafts the application answers, and tracks every application with inbox-driven status updates and an engine win-rate scoreboard.

One product rule never bent: the user always submits. Lin prepares everything; it never applies for anyone.

🎭 Click through the live demo (fictional data) · 💻 Source on GitHub

V1: speed, then the receipt

V1 shipped the way scrappy products ship — the next most valuable thing every few days, over roughly two weeks, on top of two résumé engines that represented months of prior customization work. It worked: in live validation, 300+ real postings evaluated and ~70 carried end-to-end into complete application packages, with the résumé engines A/B-tested on every package produced. The rig even included a daily cost-report job and a controlled model A/B test — the same roles run through the expensive default and two cheaper models to find out what résumé quality actually required. But every iteration left a layer behind: workflow logic drifted into scheduler prompts (three hand-synchronized copies), the agent’s instruction set grew into a single 750-line monolith loaded into every run, and the most expensive model spent its budget browsing web pages before rate-limiting out of the work it was actually for. Functional daily — but accumulated, not designed.

V2: a real product process, run with an AI agent

I rebuilt it without stopping it — and my entire engineering team was one AI agent. My role stayed purely product:

The PM craft on display

Outcome

V1V2
Scheduled jobs15, with multi-page prompts10, prompts ≤ 2 lines
Workflow homescattered prompts + one 750-line monolith11 small, single-purpose skills
Context overhead per run~21k tokens~2k tokens (−90%)
Premium-model exposure3 jobs (incl. web browsing)1 job — résumé writing only, top-N gated (~3× less volume)
LLM calls on budget modelsmixed, unmeasured≈95% (telemetry-verified)
Routine pipeline costunmeasured~$0.30/day
Postings → packages300+ evaluated, ~70 completed end-to-endsame throughput, uninterrupted
Automated tests060 (incl. real-browser click-through)
Downtime during rebuildnone

The full system is open-sourced as a reusable Hermes skill suite — sanitized to a placeholder profile, with the résumé engines’ MIT upstream projects fully credited — alongside the live demo dashboard (funnel-rail navigation, sortable table, bulk actions, dark/light personalities).

Deep-dive case studies: V1’s iterative build and the V2 rebuild playbook are available as a two-part write-up — and the open-source repo’s README doubles as the architecture tour.


← back to all work