Ninja Digital Innovations logoNinja Digital Innovations
We reply fastResponse in < 24h
Book a call
Telemetry that makes engineers faster, not busier
BackBlogEngineering

Playbook

Telemetry that makes engineers faster, not busier

A lean observability stack built on traces-first thinking, cardinality budgets, and opinionated dashboards.

Sofia Mendez6 min readJanuary 15, 2026
Updated
Traces-firstCardinality budgetsRunbooks linked to alerts
Telemetry that makes engineers faster, not busier banner
Field note6 min read

Lead with traces, not dashboards

We instrument user journeys end to end, then project metrics off of traces. Dashboards stay slim; exploratory debugging happens in traces with guardrails on cardinality.

OTel starter
import { trace, context } from "@opentelemetry/api";

const tracer = trace.getTracer("checkout");

export async function charge(userId: string, payload: ChargeInput) {
  return tracer.startActiveSpan("charge", async (span) => {
    span.setAttribute("user.id", userId);
    span.setAttribute("cart.items", payload.items.length);
    try {
      const result = await paymentClient.charge(payload);
      span.setStatus({ code: 1, message: "ok" });
      return result;
    } catch (err) {
      span.recordException(err as Error);
      span.setStatus({ code: 2, message: "failed" });
      throw err;
    } finally {
      span.end();
    }
  });
}

Rules we keep

  • Every alert links to a runbook and an owning team
  • Dashboards cap at 12 charts—anything else is a trace query
  • Sampling tuned per route with business impact in mind

Adopt sampling that tracks revenue or risk—not just traffic—so the right customers stay in view during incidents.

Dashboards that stay lean

  • Ship a single service health score: availability, latency, error rate
  • One panel per user journey; everything else is a saved trace query
  • Alert on burn rate and user impact, not raw error counts

Adopt sampling that tracks revenue or risk—not just traffic—so the right customers stay in view during incidents.

Key takeaways

  • Traces-first
  • Cardinality budgets
  • Runbooks linked to alerts
ObservabilityTracingDX

More like this

Keep exploring

View all
Shipping AI copilots with safety rails
AI8 min read

Shipping AI copilots with safety rails

Architecting LLM-powered assistants with eval loops, policy checks, and graceful fallbacks before you ever hit production.

LLMEvaluationProduct
Read articleFebruary 20, 2026
Designing SaaS uptime like a reliability ledger
Engineering7 min read

Designing SaaS uptime like a reliability ledger

How we track golden paths, SLOs, and dependency budgets so every launch comes with clear operational guardrails.

SLOReliabilityPlaybook
Read articleJanuary 28, 2026
Content engines that don’t burn out your team
Culture6 min read

Content engines that don’t burn out your team

Our SNS operating model: modular storytelling, creator pods, and analytics loops that keep momentum without burnout.

SNSOperationsGrowth
Read articleFebruary 8, 2026