llmevals
LLM evals
Explorations in LLM evaluations.
13 Oct 2025:
Comparing AI coding agents on quality, cost, speed
27 Sep 2025:
Bring a panel of experts to review code
26 Sep 2025:
Mimicking Developer Styles with Coding Agents
9 Jun 2025:
System Prompt Can Easily Be Overridden
2 Jun 2025:
Emotion Prompts Don’t Help. Reasoning Does
8 May 2025:
Deal with Hallucinations by Double-checking