← Back to writing

How I Build: AI Agents, Custom Pipelines, and Shipping at 10x Speed

5 min read

The Setup

Every feature I ship goes through a pipeline I built around AI coding agents. Not "I asked ChatGPT to write some code." A structured system with specialized agents for each phase of development: planning, implementation, code review, security audits, and QA.

The core tools are Claude Code (Anthropic's CLI for Claude) and Codex (OpenAI's coding agent). They serve different roles. Claude Code handles the main development loop, orchestrating parallel agents, reading the full codebase, and making targeted edits. Codex provides an independent second opinion, a different model with different blind spots that catches things Claude misses.

The Pipeline

Here is what happens when I start a feature:

1. Design thinking first. Before any code, I run an office-hours session that pressure-tests the idea. What is the actual problem? What does the user need? What existing code already partially solves this? This produces a design doc with premises, alternatives considered, and a recommended approach.

2. Multi-layer review. The design doc goes through three reviews, each with a different lens. An engineering review catches architecture issues, missing tests, and performance concerns. A CEO-level review challenges scope and asks whether this is the right thing to build. A design review rates the plan 0-10 on seven dimensions and fixes every gap. Each review is a full interactive session with specific questions and decisions, not a rubber stamp.

3. Parallel implementation. Once the plan is locked, I split the work into independent lanes and run multiple agents in parallel. One agent builds a component while another writes the tests for a different component. A third downloads assets or scaffolds infrastructure. The build time compression is real: what would take a team two days of sequential work collapses into 30 minutes.

4. Automated code review. After implementation, every diff gets reviewed by both Claude and Codex independently. Two models agreeing on a change is stronger signal than one model's thorough review. The review catches DRY violations, missing edge cases, security issues, and performance concerns. Findings are tagged by confidence level so I can prioritize what to fix.

5. QA with a real browser. A headless browser agent navigates the running site, clicks through flows, takes annotated screenshots, and reports bugs with visual evidence. It catches things that unit tests miss: layout breaks, missing loading states, interaction edge cases.

The Custom Agent Lanes

The pipeline uses specialized agent types, each tuned for its job:

  • Planner agents read the codebase and produce implementation plans with file lists, dependency graphs, and parallel execution strategies
  • Build agents write code in isolated worktrees so multiple features can be developed simultaneously without conflicts
  • Review agents run adversarial analysis, deliberately trying to break the code from a security, performance, and correctness perspective
  • QA agents operate a headless browser, testing real user flows against the running application

Each agent has access to the full project context (CLAUDE.md, DESIGN.md, the git history, the codebase) but is scoped to its specific job. A review agent does not start fixing bugs. A build agent does not question the plan. Separation of concerns, applied to AI.

What This Changes

The obvious thing: speed. Features that would take a week ship in an afternoon. A full design review, engineering review, and implementation cycle for a major portfolio redesign took one session.

The less obvious thing: quality. Every feature gets a security review. Every diff gets two independent code reviews. Every interaction gets QA tested with real browser automation. When the tooling handles the review burden, the quality floor rises because nothing ships without being checked.

The most important thing: I spend my time on decisions, not boilerplate. The agents handle the "write the component, match the design system, add the tests, check the types" work. I handle "is this the right architecture, does this serve the user, what is the right tradeoff." The human judgment still drives every decision. The AI handles the execution.

The Meta Proof

This portfolio site is the proof. Every section you see was planned through the office-hours pipeline, reviewed by three different review lenses (CEO, engineering, design), implemented by parallel agents, QA tested with browser automation, and shipped with a structured deployment workflow.

The DESIGN.md that governs every visual decision was generated by a design consultation agent. The blog post you are reading was written by me, but the blog infrastructure (MDX compilation, dynamic OG images, the listing page) was scaffolded by agents in under 10 minutes.

That is the pitch. Not "I use AI tools" but "I built a development environment where AI agents handle the systematic work so I can focus on the decisions that matter." The result is code that ships faster, gets reviewed more thoroughly, and follows a consistent design system across every component.

Stack

Claude Code (Anthropic), Codex (OpenAI), gstack (agent framework), Next.js, TypeScript, framer-motion, Tailwind CSS.