Blog

Notes from the field

Writing on cloud, DevOps, security, and AI engineering, informed by what actually goes wrong in production.

All (27)AI (9)DevOps (10)Security (7)Cloud (1)

Evals for LLM apps: from vibes to numbers

If you ship an LLM feature without evals, you're flying blind. Here's how to set up evaluations that actually catch regressions, in a few hundred lines of code.

March 10, 2026 4 min read