Obtrace Docs

Install, instrument, validate, and ship Obtrace in production in under 30 minutes

This guide is intentionally opinionated: it gives you the shortest path to real production value, not just a local demo.

By the end of this quick start, you will have:

One backend service instrumented and sending data.
Optional frontend instrumentation connected to the same context model.
Runtime and CI/CD context attached to incident investigation.
AI-ready interfaces available (Ask AI, llm.txt, mcp.json).

Prerequisites

Obtrace account with an API key.
Access to one production-relevant service (not a toy service).
Access to deployment configuration (env vars/secrets).
CI/CD pipeline access (recommended for release correlation).

What Success Looks Like

Before you start, define acceptance criteria:

Telemetry arrives continuously for the target service.
You can filter by service, env, and version without ambiguity.
At least one error or trace contains enough context for diagnosis.
You can associate telemetry spikes with release/deploy metadata.

Step 1: Configure Authentication Correctly

Follow Authentication and create environment-scoped credentials.

Recommended key strategy:

One key per environment (dev, staging, prod).
Optional per-service keys for blast-radius control.
Server-side keys only; never ship privileged keys to browser clients.

Step 2: Instrument the Highest-Impact Backend Service

Pick the service that creates the largest operational risk (checkout, auth, billing, API gateway).

Select runtime in SDK Catalog.
Install and initialize SDK.
Add canonical attributes to every event/span:
- service
- env
- version
- region (if multi-region)

Why this order: backend-first usually gives the highest diagnostic value per minute invested.

Step 3: Add Frontend Instrumentation (If User Impact Matters)

If your incidents affect user interaction, add browser telemetry.

Use JavaScript Browser SDK.
Capture page, route, and interaction context where relevant.
Correlate frontend failures with backend requests whenever possible.

This is what turns “backend is slow” into “which user path degraded and why”.

Step 4: Validate Ingestion and Data Quality

Do not continue rollout before this gate.

Validation checklist:

Data flow is continuous, not bursty.
Timestamps are sane (no major clock drift).
Sampling is intentional and documented.
No frequent 401/403 or transport retries.
Tags are stable and standardized across services.

Step 5: Add Runtime Integration

Connect your actual runtime early:

Goal: incident context should include where the workload ran, not only app-level events.

Step 6: Attach CI/CD Context

Integrate GitHub Actions so telemetry can be read together with release events.

Minimum release context fields to propagate:

commit SHA
build ID
deploy timestamp
environment

Without release context, root-cause analysis always loses time.

Step 7: Enable AI Workflows (After Baseline Is Clean)

Use floating Ask AI button in docs for contextual help.
Publish machine-readable context at /llm.txt and /mcp.json.
Review MCP for agent integrations.

AI workflows are only as good as your telemetry quality. Fix instrumentation quality first.

Common Mistakes

Starting with too many services at once.
Missing service/env/version tags.
Reusing a single key across all environments.
Rolling out before validating ingestion quality.
Treating docs as reference-only instead of an operations playbook.

Next Paths

Architecture and mental model: Introduction
Operational rollout sequence: How to use
Runtime decision criteria: Integration Matrix

Quick Start