Observability Patterns for AI Services

AI services need a wider observability surface than conventional APIs.

Track quality and performance together

Quality drift and latency regressions often appear at the same time.

Log prompts, model versions, response quality labels, latency, token usage, and failure reasons.

{
  "requestId": "req_124",
  "model": "gpt-4.1",
  "latencyMs": 842,
  "tokens": 1920
}

Alert fatigue is real, so align thresholds with product impact and user-visible failures.