NDI

Telemetry that makes engineers faster, not busier banner

Field note6 min read

Lead with traces, not dashboards

We instrument user journeys end to end, then project metrics off of traces. Dashboards stay slim; exploratory debugging happens in traces with guardrails on cardinality.

OTel starter

import { trace, context } from "@opentelemetry/api";

const tracer = trace.getTracer("checkout");

export async function charge(userId: string, payload: ChargeInput) {
  return tracer.startActiveSpan("charge", async (span) => {
    span.setAttribute("user.id", userId);
    span.setAttribute("cart.items", payload.items.length);
    try {
      const result = await paymentClient.charge(payload);
      span.setStatus({ code: 1, message: "ok" });
      return result;
    } catch (err) {
      span.recordException(err as Error);
      span.setStatus({ code: 2, message: "failed" });
      throw err;
    } finally {
      span.end();
    }
  });
}

Rules we keep

Every alert links to a runbook and an owning team
Dashboards cap at 12 charts—anything else is a trace query
Sampling tuned per route with business impact in mind

Adopt sampling that tracks revenue or risk—not just traffic—so the right customers stay in view during incidents.

Dashboards that stay lean

Ship a single service health score: availability, latency, error rate
One panel per user journey; everything else is a saved trace query
Alert on burn rate and user impact, not raw error counts

Adopt sampling that tracks revenue or risk—not just traffic—so the right customers stay in view during incidents.

Key takeaways

Traces-first
Cardinality budgets
Runbooks linked to alerts

ObservabilityTracingDX

Telemetry that makes engineers faster, not busier

Lead with traces, not dashboards

Dashboards that stay lean

Keep exploring

Shipping AI copilots with safety rails

Designing SaaS uptime like a reliability ledger

Content engines that don’t burn out your team