Open Telemetry in NextJs and NestJs

Photo by Adi Goldstein on Unsplash

I wrote about how to use open telemetry with NestJs and React previously.

I wanted to add open telemetry to my NextJs app that calls a NestJs backend. The paradigm of SSR preferred by NextJs is a bit different than the CSR paradigm of React.

I'll describe the differences and how I added open telemetry to NextJs that propagates to other backend APIs.

NextJs SSR vs CSR

NextJs is a server side rendered framework. This means that when you load a page it will render the page on the server and send the HTML to the client.

The client will then hydrate the page (if even necessary) and make it interactive. You can have NextJs render the page on the client like "classic" React but that's not recommended.

This means that

  1. You now have a "server" for every NextJs app and
  2. The client-side code for Open Telemetry that worked in React won't work in NextJs.

Open Telemetry In NextJS Diagram

The open telemetry site does an excellent job of describing how to use open telemetry in different languages and frameworks and you should read (the concepts section)[https://opentelemetry.io/docs/concepts/].

Here's a diagram of how I implemented open telemetry in NextJs. Use this as a reference when reading the code below.

Open Telemetry NextJs Cheatsheet

Development and Production

I use a local open telemetry collector for development and a hosted SaaS called honeycomb.io for production telemetry.

The various instrumentations that run in the code use environment variables to determine where to send the telemetry.

The main one that changes is OTEL_EXPORTER_OTLP_ENDPOINT which is the endpoint of the telemetry collector. In development it's localhost... and in production it's https://api.honeycomb.io....

Auto instrumentation libraries

Auto instrumentation libraries are code that run at application start and "wrap" common libraries so every time you call say redis.get() a span is automatically created for you.

The auto instrumentations also handle automatic propagation for you if possible. So the fetch() configuration automatically has the W3C trace propagation header added if you're using that propagation method.

Sometimes this doesn't work as expected and you have to manually propagate the trace context. I'll cover this in the NextJs section.

Instrumenting NextJs Backend

For NextJs I manually configured the instrumentation rather than use the NextJs Package.

First, you have to turn on instrumentation in NextJs in your next.config.js file

  experimental: {
        instrumentationHook: true,
    },

Install all the otel libraries

pnpm add @opentelemetry/api @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-metrics-otlp-grpc @opentelemetry/exporter-trace-otlp-http @opentelemetry/instrumentation-fetch @opentelemetry/resources @opentelemetry/sdk-metrics @opentelemetry/sdk-node @opentelemetry/sdk-trace-base @opentelemetry/sdk-trace-node @opentelemetry/semantic-conventions

I specified to use grpc on the backend portion of NextJs using environment variables

OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpc

For some reason the NextJs backend does not instrument the fetch module. So I manually propagate the trace context to the fetch module.

import { context, propagation } from "@opentelemetry/api";

const headers = {};
propagation.inject(context.active(), headers);

// pass these headers into every fetch request to a service that you want tracing propagate to e.g. the NestJs backend.

To actually create the instrumentation use the same nodejs instrumentation that's in the NestJs backend. With NextJs you must use a special file name instrumentation.ts as a hook to initialise instrumentation before the NextJs app is loaded.

export async function register() {
  console.log("registering instrumentation...");
  if (process.env.NEXT_RUNTIME === "nodejs") {
    await import("./otel/instrumentation.node");
  }
}

The otel/instrumentation.node.ts file is a standard NodeSDK instrumentation file. I've covered the contents of this file in the previous article.

Instrumenting NextJs Frontend

I set these env vars for production. Note the otel headers that are used to configure to honeycomb. These are not required for the development otel collector.

NEXT_PUBLIC_OTEL_EXPORTER_OTLP_ENDPOINT="https://api.honeycomb.io/v1/traces"
NEXT_PUBLIC_OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=YOUR_EVENT_ONLY_KEY,x-honeycomb-dataset=next-app-client"

The following code is used to configure the otel libraries in the NextJs frontend. Note the different trace provider here. This is for client side tracing.

Chrome will require CORS to be configured on the otel collector locally to work. Honeycomb has cors enabled by default.

function mapHeadersToObject(headers: string): { [key: string]: string } {
  const result: { [key: string]: string } = {};
  headers.split(",").forEach((h) => {
    const [key, value] = h.split("=");
    result[key] = value;
  });
  return result;
}
const headers = process.env.NEXT_PUBLIC_OTEL_EXPORTER_OTLP_HEADERS
  ? mapHeadersToObject(process.env.NEXT_PUBLIC_OTEL_EXPORTER_OTLP_HEADERS)
  : undefined;
export const initInstrumentation = () => {
  //no metrics for now

  const resource = new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: "next-app-client",
  });

  const provider = new WebTracerProvider({ resource });

  const exporter = new OTLPTraceExporter({
    // optional - url default value is http://localhost:4318/v1/traces
    url: `${process.env.NEXT_PUBLIC_OTEL_EXPORTER_OTLP_ENDPOINT}`,
    headers: headers,
  });
  provider.addSpanProcessor(new BatchSpanProcessor(exporter));

  // Initialize the provider
  provider.register({
    propagator: new W3CTraceContextPropagator(),
  });

  // Registering instrumentations / plugins
  registerInstrumentations({
    instrumentations: [
      new DocumentLoadInstrumentation(),
      new FetchInstrumentation({
        propagateTraceHeaderCorsUrls: [
          /localhost/g,
          /honeycomb.io/g,
          /host.docker.internal/g,
        ],
        clearTimingResources: true,
      }),
    ],
  });
};

Instrumenting NestJs Backend

I've covered this extensively in the previous article. The otel libraries are quite stable and not much has changed with the libraries since then.

You still have to make sure that you load the otel libraries before you load the NestJs app. If you're able to use ESM modules you can use the following code (This is different to my previous approach).

The otel library can be configured using the built in OTEL environment variables.

OTEL_EXPORTER_OTLP_COMPRESSION=gzip
OTEL_EXPORTER_OTLP_ENDPOINT=http://host.docker.internal:4317
OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpc

I create the open telemetry instrumentation as usual and then do an async import of the NestJs app with ESM. This ensures that otel is loaded before NestJs and has a chance to patch all the libraries it needs to patch.

export const initTelemetry = async (): Promise<void> => {
  const metricExporter = new OTLPMetricExporter({});

  const metricReader = new PeriodicExportingMetricReader({
    exporter: metricExporter,
    exportIntervalMillis: 60_000,
  });

  const sdk = new NodeSDK({
    resource: new Resource({
      [SemanticResourceAttributes.SERVICE_NAME]: "backend-app",
    }),
    metricReader,
    instrumentations: getNodeAutoInstrumentations({
      // eslint-disable-next-line @typescript-eslint/naming-convention
      "@opentelemetry/instrumentation-fs": {
        enabled: false, // very noisy
      },
    }),
  });
  console.log("starting otel instrumentation...");
  function shutdown() {
    // eslint-disable-next-line promise/catch-or-return
    sdk
      .shutdown()
      .then(
        () => console.log("SDK shut down successfully"),
        // eslint-disable-next-line unicorn/catch-error-name, unicorn/prevent-abbreviations
        (err) => console.log("Error shutting down SDK", err)
      )
      // eslint-disable-next-line unicorn/no-process-exit
      .finally(() => process.exit(0));
  }

  process.on("exit", shutdown);
  process.on("SIGINT", shutdown);
  process.on("SIGTERM", shutdown);
  // eslint-disable-next-line promise/catch-or-return, @typescript-eslint/no-unsafe-call, @typescript-eslint/no-unsafe-member-access, promise/always-return
  sdk.start();

  await import("./init-app.js");

  console.log("SDK started successfully");
};

The Development Otel Collector

I use a local collector so I don't waste resources on the Honeycomb free tier for development.

I use the following docker-compose services to run the otel collector locally. (This is a partial docker-compose file with some services only)

# Jaeger
jaeger-all-in-one:
  container_name: ${COMPOSE_PROJECT_NAME}_jaeger
  image: jaegertracing/all-in-one:latest
  restart: "no"
  environment:
    - COLLECTOR_OTLP_ENABLED=true
  deploy:
    resources:
      limits:
        memory: 400M
  ports:
    - "16685:16685" # jaeger-query grpc for the admin ui
    - "16686:16686" # jaeger-query http for the admin ui
    # - "14250:14250" # Used by jaeger-agent to send spans in model.proto format.
    # - "14268:14268" # jaeger thrift protocol
    - "14269:14269" # Admin port: health check at / and metrics at /metrics.
    # - "6831:6831/udp" # jaeger thrift protocol
    # - "6832:6832/udp" # jaeger thrift protocol
    # - "5778:5778" # remote sampling
    # don't have to expose these. internal only
    #- "16687:4317" # for forwarding traces in otlp to jaeger (grpc)
    #- "16688:4318" # for forwarding otlp traces to jaeger (http)
# Collector
otel-collector:
  container_name: ${COMPOSE_PROJECT_NAME}_otel_collector
  image: otel/opentelemetry-collector:latest
  restart: "no"
  deploy:
    resources:
      limits:
        memory: 400M
  command: ["--config=/etc/otel-collector-config.yaml", "${OTELCOL_ARGS:-}"]
  volumes:
    - ./infrastructure/otel-collector-config.yaml:/etc/otel-collector-config.yaml
  ports:
    - "1888:1888" # pprof extension
    - "8888:8888" # Prometheus metrics exposed by the otel collector itself
    - "8889:8889" # Prometheus exporter metrics passed through the otel collector from your apps
    - "13133:13133" # health_check extension
    - "4317:4317" # OTLP gRPC receiver
    - "4318:4318" # OTLP HTTP receiver
    - "55679:55679" # zpages extension
  depends_on:
    - jaeger-all-in-one
prometheus:
  container_name: ${COMPOSE_PROJECT_NAME}_prometheus
  image: prom/prometheus:latest
  restart: "no"
  deploy:
    resources:
      limits:
        memory: 400M
  volumes:
    - ./infrastructure/prometheus.yaml:/etc/prometheus/prometheus.yml
  ports:
    - "9090:9090"

Use the following config for the otel collector. This sets up http and grpc locally. It also sets up the prometheus exporter and the jaeger exporter.

I turn on cors on the otel collector http receiver so that I can send traces from the browser to the otel collector. I also turn on metadata so that I can see the browser user agent in the traces.

receivers:
  otlp:
    protocols:
      grpc:
        include_metadata: true
      http:
        cors:
          allowed_origins:
            - "http://localhost*"
          allowed_headers:
            - "*"
        include_metadata: true
exporters:
  prometheus:
    endpoint: "otel-collector:8889"
    resource_to_telemetry_conversion:
      enabled: true
    enable_open_metrics: true

  logging:

  otlp:
    endpoint: jaeger-all-in-one:4317
    tls:
      insecure: true
processors:
  batch:

extensions:
  health_check:
  pprof:
    endpoint: :1888
  zpages:
    endpoint: :55679

service:
  telemetry:
    logs:
      level: "debug" # default is info
  extensions: [pprof, zpages, health_check]
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]

Configure prometheus to scrape the otel collector.

scrape_configs:
  - job_name: "otel-collector"
    scrape_interval: 10s
    static_configs:
      - targets: ["otel-collector:8889"]
      - targets: ["otel-collector:8888"]

Viewing Traces

You can view the traces in Jaeger at http://localhost:16686/.

You can view the metrics in Prometheus at http://localhost:9090/.

Conclusion

There are some tricky bits around making sure all the environment variables are configured for each app-environment correctly.

NextJS has a backend and frontend client and if you want full tracing from the browser, you have to instrument both separately.

If you want to use an app template that has all of this preconfigured then checkout https://usemiller.dev/miller-start