A wrong character in a URL killed our LLM tracing for 4 weeks
The health check was green the whole time. A story about why existence checks aren't arrival checks.
How production AI systems fail — usually quietly — and the gates that catch it. Drawn from systems I actually run.