Skip to content

OpenTelemetry in Timesketch: Getting Started Guide

This document provides a comprehensive guide for developers, admins, and users on how OpenTelemetry (OTel) is implemented in Timesketch, how to configure it, and how to verify it locally.


1. Overview

Timesketch uses OpenTelemetry to provide distributed tracing across its web (Flask) and worker (Celery) components. This enables deep observability into request life cycles and background task performance.

Key Benefits

  • Distributed Tracing: Track a single request from an external tool (like dftimewolf) through the API and into background analyzers.
  • Log Correlation: Trace IDs and Span IDs are automatically injected into structured JSON logs, allowing you to jump from a log line directly to a trace waterfall in tools like GCP Cloud Trace or Jaeger.
  • Standardized Protocol: Uses the industry-standard OpenTelemetry Protocol (OTLP).

2. Architecture

The instrumentation is centralized in a dedicated module: timesketch/lib/telemetry.py.

  • Flask Instrumentation: Automatically captures spans for all HTTP requests, including route patterns and status codes.
  • Celery Instrumentation: Captures spans for both task dispatching (producer) and execution (worker), maintaining the trace context across process boundaries.
  • Async Exporting: Spans are exported asynchronously using a BatchSpanProcessor to ensure minimal impact on application performance.

3. Configuration Reference

Telemetry is controlled entirely via environment variables.

Variable Description Example / Default
TIMESKETCH_OTEL_MODE The export mode. Must start with otlp-. otlp-grpc, otlp-http, otlp-default-gce
TIMESKETCH_OTLP_GRPC_ENDPOINT OTLP collector endpoint (gRPC). jaeger:4317
TIMESKETCH_OTLP_HTTP_ENDPOINT OTLP collector endpoint (HTTP). http://jaeger:4318/v1/traces
TIMESKETCH_OTLP_INSECURE Use insecure (non-TLS) connection. true (default for dev)
TIMESKETCH_ENV Environment identifier. production, development

Supported Modes:

  1. otlp-grpc: Best for local collectors (e.g., OTel Collector or Jaeger).
  2. otlp-http: Standard OTLP over HTTP/JSON.
  3. otlp-default-gce: Recommended for production on GCP. Sends traces directly to Google Cloud Trace API from a GCE instance using the Metadata Server for project identification and credentials.

4. Local Development & Testing

Option A: Using Docker Compose

  1. Start the Core Environment: Navigate to the dev docker directory and start the core services: bash cd docker/dev docker-compose up -d

  2. Start the Telemetry Stack (Optional): To enable the telemetry services (OTel Collector and Jaeger v2) AND instruct Timesketch to send traces, use the telemetry profile and set TIMESKETCH_OTEL_MODE: bash TIMESKETCH_OTEL_MODE=otlp-grpc docker-compose --profile telemetry up -d (Alternatively, you can add TIMESKETCH_OTEL_MODE=otlp-grpc to a .env file in the docker/dev or config.env for release)

Note on Dependencies: Since the development image does not yet contain the new OpenTelemetry libraries, you must install them manually whenever the container is recreated:

docker exec timesketch-dev pip install -r /usr/local/src/timesketch/requirements.txt
docker restart timesketch-dev

Option B: Using Tilt

If you use Tilt for development, the telemetry stack is integrated automatically:

tilt up

The Tilt dashboard will show otel-collector and jaeger resources, including a direct link to the Jaeger UI.


5. Visualization Options

The local environment provides two ways to see your traces. You can switch between them by changing the TIMESKETCH_OTLP_GRPC_ENDPOINT.

1. Via OTel Collector (Gateway)

Default Configuration: TIMESKETCH_OTLP_GRPC_ENDPOINT=otel-collector:4317 * Why use this: The collector acts as a gateway. It logs the raw spans to its own terminal (docker logs -f otel-collector) AND forwards them to Jaeger. * Best for: Seeing raw attributes and verifying the export pipeline.

2. Directly to Jaeger

Custom Configuration: TIMESKETCH_OTLP_GRPC_ENDPOINT=jaeger:4317 * Why use this: Bypasses the collector and sends data straight to Jaeger. * Access: Open http://localhost:16686/jaeger in your browser. * Best for: Clean waterfall visualization and searching for past traces.


6. Triggering Activity & Verification

Generate some traffic to verify the setup:

# Trigger a Flask Trace (API Call)
docker exec timesketch-dev curl -s http://localhost:5000/api/v1/info/

# Trigger a Celery Trace (Run Analyzer)
docker exec timesketch-dev celery -A timesketch.lib.tasks call timesketch.lib.tasks.run_index_analyzer

Check Application Logs: Verify that trace_id and span_id appear in the JSON output:

docker logs timesketch-dev | grep trace_id

7. Secure Private Access (GCP)

If you are running Timesketch on a private GCE VM without an external IP, you can "proxy in" securely using Identity-Aware Proxy (IAP) Tunneling.

Accessing the Web Interfaces

Run these commands on your local machine to create a secure tunnel:

Tip for Cloudtop Users: If you are on Cloudtop, the recommended way to connect is via the BeyondCorp SUP Relay.

1. Direct SSH into the VM:

ssh ${USER}_google_com@nic0.timesketch-otel-lab.us-central1-a.c.jaegeral-timesketch-946302.internal.gcpnode.com \
    -o ProxyCommand='corp-ssh-helper %h %p'

2. Access Timesketch UI (Tunneling): You can use standard SSH port forwarding with the command above:

ssh -L 5000:localhost:5000 \
    ${USER}_google_com@nic0.timesketch-otel-lab.us-central1-a.c.jaegeral-timesketch-946302.internal.gcpnode.com \
    -o ProxyCommand='corp-ssh-helper %h %p'

Now open http://localhost:5000 in your browser.

Alternative: Standard IAP Tunneling If the above doesn't work, you can use gcloud IAP tunnels:

gcloud compute start-iap-tunnel timesketch-otel-lab 5000 \
    --local-host-port=localhost:5000 \
    --zone=us-central1-a \
    --project=jaegeral-timesketch-946302 \
    --ssh-flag="-o ProxyCommand='corp-ssh-helper %h %p'"

8. Deployment Guide (GCP)

To enable production tracing in GCP: 1. Set TIMESKETCH_OTEL_MODE=otlp-default-gce. 2. Ensure the service account running Timesketch has the roles/cloudtrace.agent role. 3. View your traces in the GCP Trace Explorer.


8. Information for Developers

Automated Coverage

Most common operations are already covered by auto-instrumentation: * Web API: All Flask routes, status codes, and HTTP methods. * Background Tasks: All Celery task dispatching and executions. * Analyzers: All analyzers automatically report sketch_id, analyzer_name, timeline_id, and execution status via the BaseAnalyzer interface.

Adding Custom Attributes & Events

If you need to record specific domain metadata (e.g., number of matches found, search query used) from within your code, use the helpers in timesketch.lib.telemetry.

Example: Adding attributes in an Analyzer

from timesketch.lib import telemetry

def analyze(self):
    # ... logic ...
    matches_found = len(results)

    # This will appear in the Span attributes in Jaeger/GCP
    telemetry.add_attribute_to_current_span("sigma.matches_count", matches_found)

    # Record a significant milestone as an event
    telemetry.add_event_to_current_span("Finished parsing rules")

    return f"Found {matches_found} matches."

Best Practices for Attributes

  • Use Namespace Prefixes: To avoid collisions, prefix your attributes (e.g., sigma.rule_id, sketch.member_count).
  • Data Types: Simple types (strings, ints, bools, floats) are stored natively. Complex objects (dicts, lists) are automatically serialized to JSON.
  • Avoid PII: Never record sensitive user data or authentication tokens in span attributes.