llmevals

Coding Agents - A Comparison

I asked multiple AI coding agents to:

Create a single-page web app at index.html that beautifully renders a GitHub user profile and activity comprehensively. Pick the ID in the URL ?id=..., default to ?id=torvalds.

Here are the results. 🔴 indicates failure.

Tool Model Quality Cents Seconds Logs
OpenCode claude-sonnet-4.5 7 13.2 54 Logs
OpenCode gpt-5-codex 6 14.9 105 Logs
Codex gpt-5-codex - medium 6 21.7 130 Logs
Claude Code claude-sonnet-4.5 5 9.3 66 Logs
OpenCode gemini-2.5-pro 3 14.2 37 Logs
OpenCode grok-4 (🔴 escaped HTML) 1 8.4 65 Logs
OpenCode glm-4.6 (🔴 no output) 0 1.8 143 Logs

My takeaways:

Setup

export ROOT=$(pwd)
export TASK='Create a single-page web app at `index.html` that beautifully renders a GitHub user profile and activity comprehensively. Pick the ID in the URL `?id=...`, default to `?id=torvalds`.'

# Claude Code - Claude 4.5 Sonnet
export DIR=$ROOT/claude-code/claude-sonnet-4.5
mkdir -p $DIR && cd $DIR
npx -y @anthropic-ai/claude-code \
  --model claude-sonnet-4-5-20250929 \
  --permission-mode acceptEdits \
  --print \
  "$TASK"

# Codex - GPT 5 Codex Medium
export DIR=$ROOT/codex/gpt-5-codex-medium
mkdir -p $DIR && cd $DIR
npx -y @openai/codex exec \
  --model gpt-5-codex \
  --config 'model_reasoning_effort="medium"' \
  --full-auto "$TASK"

# OpenCode - GLM 4.6
export DIR=$ROOT/opencode/glm-4.6
mkdir -p $DIR && cd $DIR
npx -y opencode-ai run \
  --model openrouter/z-ai/glm-4.6 \
  "$TASK"

# OpenCode - GPT 5 Codex
export DIR=$ROOT/opencode/gpt-5-codex
mkdir -p $DIR && cd $DIR
npx -y opencode-ai run \
  --model openrouter/openai/gpt-5-codex \
  "$TASK"

# OpenCode - Claude 4.5 Sonnet
export DIR=$ROOT/opencode/claude-sonnet-4.5
mkdir -p $DIR && cd $DIR
npx -y opencode-ai run \
  --model openrouter/anthropic/claude-sonnet-4.5 \
  "$TASK"

# OpenCode - Grok 4
export DIR=$ROOT/opencode/grok-4
mkdir -p $DIR && cd $DIR
npx -y opencode-ai run \
  --model openrouter/x-ai/grok-4 \
  "$TASK"

export DIR=$ROOT/opencode/gemini-2.5-pro
mkdir -p $DIR && cd $DIR
npx -y opencode-ai run \
  --model openrouter/google/gemini-2.5-pro \
  "$TASK"