OpenTelemetry Part 2 - Implementation

OpenTelemetry Part 2 - Implementation

Introduction

In our last discussion, we explored the basics of observability, breaking down the crucial components: Logs, Metrics, and Traces. Additionally, we highlighted the role played by the OpenTelemetry collector in orchestrating a comprehensive system visibility strategy.

Now, let's shift to the practical side. Imagine this as a hands-on experience through the details of code. We're about to see how OpenTelemetry seamlessly integrates into various programming languages, navigates through different communication protocols, and synchronizes with diverse messaging systems.

My focus is not going to be a fully-fledged implementation. It's going to be an example to implement the framework in multiple programming languages, communication protocols, and messaging for demonstration purposes. Aiming to showcase how OpenTelemetry dynamically adapts and demonstrates its versatility across different coding environments

High-Level Architecture

High level architecture of my example

In our pursuit of a webshop implementation, we strive for simplicity. The initiator of our architecture is the ApiGateway, serving as a BFF (Backend For Frontend) for potential frontend application (not implemented here). This gateway takes charge, coordinating and orchestrating simulated order events. Supporting this orchestration are multiple backend APIs, worker services, databases, and the RabbitMQ instance for eventing and messaging.

The ApiGateway:S role is to provide one single point of contact for the frontend, and to orchestrate to flow of events with backend APIs and the RabbitMQ instance. Each Backend API delivers domain-specific functionality, with the Order and Product services having their own dedicated databases. Meanwhile, the Email Service keeps an ear out for Notification Events on the RabbitMQ instance, promptly sending emails when triggered. Adding a touch of realism, the UpdateReceptionist simulates the sporadic updating of product prices.

For the sake of demonstration, I've implemented this system in various languages (.Net C#, Go, Python) and protocols (HTTP, gRPC), utilizing two different database providers (Postgres, MySQL).

The OpenTelemetry framework is implemented with the SDK, and not in any automatic way. I prefer the SDK for the level of control and customizability. OpenTelemetry has very good support to monitor an application without touching the code, which I imagine is very powerful for a legacy production system that developers do not prefer to touch or spend time developing in. Maybe that would be a separate topic in the future.

OpenTelemetry Setup

Setting up the OpenTelemetry SDK is remarkably straightforward. For each programming language I've reviewed the documentation suggest implementing and configuring the framework during application startup.

In each language, we employ hooks into the program and leverage default components to set up Logs, Metrics, and Traces. Typically, the initial step involves initializing the Resource, an immutable entity responsible for producing artifacts such as Logs, Metrics, and Traces. During this initialization, we can establish default attributes, service names, versions, and more.

Prior to configuring the SDK, it's essential to ensure that the environment has the necessary SDK dependencies for the respective language. Always refer to the OpenTelemetry guidelines and import the required dependencies for the active language.

OTEL Collector

For this setup we are leveraging the OTEL Collector as a single point of telemetry ingestion. The Collector have its own configuration that specifies how and where the Telemetry should be sent to.

OTEL Collector implementation example

The following configuration is applied to this application.

Receivers:

# The receivers are the section that allows the OpenTelemetry Collector to receive data from various sources using Push or Pull model.
# The receiver name must match the name of the receiver in the service/pipelines/traces/receivers section.
receivers:
    otlp:
        protocols:
            grpc:
            http:

    rabbitmq:
        endpoint: http://rabbitmq:15672
        username: myuser
        password: password
        collection_interval: 10s
        tls:
            insecure: true

Exporters:

# The exporters are the section that allows the OpenTelemetry Collector to export data to various sources.
# The exporter name must match the name of the exporter in the service/pipelines/traces/exporters section.
exporters:
    # exports zipkin format spans to the zipkin instance
    zipkin:
        endpoint: "http://zipkin:9411/api/v2/spans"
        tls:
            insecure: true

    # exports logs to the seq instance
    otlphttp/seq:
        endpoint: "http://seq:5341/ingest/otlp"
        tls:
            insecure: true

    # Setting logging level to debug (possibilities to filter log level in a central place)
    debug:
        verbosity: detailed

    # Exporter for the metrics endpoint for the prometheus instance
    prometheus:
        endpoint: "0.0.0.0:8889"

    # Exporter for the jeager instance
    otlp:
        endpoint: jaeger:4317
        tls:
            insecure: true

Service:

# The service section is the top-level section that contains the configuration for the OpenTelemetry Collector service.
# The pipelines section contains the configuration for the data pipelines that the OpenTelemetry Collector will run.
# The name of the pipeline must match the name of the pipeline in the service/pipelines section.
service:
    pipelines:
        traces:
            receivers: [otlp]
            processors: [batch]
            exporters: [debug, zipkin, spanmetrics, otlp]
        metrics:
            receivers: [otlp, rabbitmq]
            processors: [batch]
            exporters: [debug, prometheus]
        logs:
            receivers: [otlp]
            processors: []
            exporters: [debug, otlphttp/seq]
        metrics/spanmetrics:
            receivers: [spanmetrics]
            exporters: [prometheus]

Full Configuration

# The receivers are the section that allows the OpenTelemetry Collector to receive data from various sources using Push or Pull model.
# The receiver name must match the name of the receiver in the service/pipelines/traces/receivers section.
receivers:
    otlp:
        protocols:
            grpc:
            http:

    rabbitmq:
        endpoint: http://rabbitmq:15672
        username: myuser
        password: password
        collection_interval: 10s
        tls:
            insecure: true

# The exporters are the section that allows the OpenTelemetry Collector to export data to various sources.
# The exporter name must match the name of the exporter in the service/pipelines/traces/exporters section.
exporters:
    # exports zipkin format spans to the zipkin instance
    zipkin:
        endpoint: "http://zipkin:9411/api/v2/spans"
        tls:
            insecure: true

    # exports logs to the seq instance
    otlphttp/seq:
        endpoint: "http://seq:5341/ingest/otlp"
        tls:
            insecure: true

    # Setting logging level to debug (possibilities to filter log level in a central place)
    debug:
        verbosity: detailed

    # Exporter for the metrics endpoint for the prometheus instance
    prometheus:
        endpoint: "0.0.0.0:8889"

    # Exporter for the jeager instance
    otlp:
        endpoint: jaeger:4317
        tls:
            insecure: true

connectors:
    spanmetrics:

processors:
    batch:

# The service section is the top-level section that contains the configuration for the OpenTelemetry Collector service.
# The pipelines section contains the configuration for the data pipelines that the OpenTelemetry Collector will run.
# The name of the pipeline must match the name of the pipeline in the service/pipelines section.
service:
    pipelines:
        traces:
            receivers: [otlp]
            processors: [batch]
            exporters: [debug, zipkin, spanmetrics, otlp]
        metrics:
            receivers: [otlp, rabbitmq]
            processors: [batch]
            exporters: [debug, prometheus]
        logs:
            receivers: [otlp]
            processors: []
            exporters: [debug, otlphttp/seq]
        metrics/spanmetrics:
            receivers: [spanmetrics]
            exporters: [prometheus]

Resource Setup

We utilize the Resource SDK to configure the resource. The resource is later used as input to wiring up the Logs, Metrics and Traces.

CSharp

using OpenTelemetry.Resources;

var serviceName = "service.name";
var resourceAttributes = new KeyValuePair<string, object>[]
{
    new("service.version", "0.0.1-alpha")
};
var resource = ResourceBuilder.CreateDefault()
                .AddService(serviceName)
                .AddAttributes(resourceAttributes);

Go

import (
	"go.opentelemetry.io/otel/sdk/resource"
	semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
)

serviceName := "service.name"
serviceVersion := "0.0.1-alpha"
otel_resource := resource.Merge(resource.Default(),
    resource.NewWithAttributes(semconv.SchemaURL,
        semconv.ServiceName(serviceName),
        semconv.ServiceVersion(serviceVersion),
    ))

Python

from opentelemetry.sdk.resources import SERVICE_NAME, SERVICE_VERSION, Resource

resource = Resource(attributes={
    SERVICE_NAME: "service.name",
    SERVICE_VERSION: "0.0.1-alpha"
})

Typescript

import { Resource } from "@opentelemetry/resources";
import { SemanticResourceAttributes } from "@opentelemetry/semantic-conventions";

const resource = new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: "service.name",
    [SemanticResourceAttributes.SERVICE_VERSION]: "0.0.1-alpha",
});

Logs

When we are wiring up Logs we have multiple possibilities and configuring options. We can output logs to various exporters (as many as we want). We can also configure things as batching for each configured service.

CSharp

using OpenTelemetry.Logs;
using OpenTelemetry.Resources;

builder.Logging.AddOpenTelemetry(options =>
{
  options
    //The resource variable is the Resource we setup in previously
    .SetResourceBuilder(resource)
    //Output all logs to the Console
    .AddConsoleExporter()
    //Send all the logs with OTLP protocol
    .AddOtlpExporter(exporter =>
    {
        //Set endpoint to send the logs to
        exporter.Endpoint = builder.Configuration.GetValue<Uri>("OTEL_EXPORTER_OTLP_ENDPOINT");
        //Use Grpc protocol
        exporter.Protocol = OtlpExportProtocol.Grpc;
        //Configure batching
        exporter.BatchExportProcessorOptions.ScheduledDelayMilliseconds = 100;
        exporter.BatchExportProcessorOptions.MaxExportBatchSize = 20;
        //.. More options is available and includes client factories, headers etc.
    });
});

Go

Currently in Development for OpenTelemetry

Python

# Logs in Python is Currently in Expiremental for OpenTelemetry
import os
import logging
from opentelemetry.sdk.resources import Resource
from opentelemetry._logs import set_logger_provider
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import (
    OTLPLogExporter
)
from opentelemetry.sdk._logs.export import (
    BatchLogRecordProcessor,
    SimpleLogRecordProcessor,
    ConsoleLogExporter
)

endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
# resource previously configured
logger_provider = LoggerProvider(
    resource=resource
)
# set global logger
set_logger_provider(logger_provider)
# create OTLP Log Exporter with endpoint
exporter = OTLPLogExporter(endpoint=endpoint, insecure=True)
# Setting up log processors, they are called in the same order they are setup.
# Batch OTLP Logging => Output Logs to Console
logger_provider.add_log_record_processor(BatchLogRecordProcessor(exporter))
logger_provider.add_log_record_processor(
    SimpleLogRecordProcessor(ConsoleLogExporter()))
# setting Logging level to DEBUG (for demonstration purposes)
handler = LoggingHandler(level=logging.DEBUG, logger_provider=logger_provider)

# Attach OTLP handler to root logger
logging.getLogger().addHandler(handler)

Typescript

Currently in Development for OpenTelemetry

Metrics

Metrics are numerical data points that offer insights into the performance and health of a system. They include measurements of resource utilization, response times, error rates, throughput, and custom domain-specific metrics. Metrics play a crucial role in monitoring system health, identifying trends, and establishing performance baselines. Examples include CPU usage, request rates, and memory utilization.

In my experience, implementing Metrics has generally been straightforward, although there are instances where "automatic metrics" are lacking. By automatic metrics, I mean that as a developer, I don't want to engage in the tedious task of implementing response time and counting for every single endpoint. They are some instrumentation libraries for different protocols, frameworks etc. This challenge could stem from my limited experience with some of the languages I've worked with; feel free to reach out and teach me if you have insights.

Starting from .NET 8 and newer, Metrics integration is built directly into the process. It automatically measures resources and basic metrics of an application. When activated, this information is reported to the configured endpoint or could be directly published using the Prometheus-styled protocol for the Prometheus instance to scrape.

CSharp

using OpenTelemetry.Resources;
using OpenTelemetry.Exporter;
using OpenTelemetry.Metrics;

builder.Services.AddOpenTelemetry()
  .ConfigureResource(resource => resource
    .AddService(serviceName)
    .AddAttributes(resourceAttributes))
  .WithMetrics(metrics => metrics
    // Adding automatic AspNet Core instrumentation (for incoming request etc.)
    .AddAspNetCoreInstrumentation()
    //Adding automatic runtime instrumentation
    .AddRuntimeInstrumentation()
    //Adding automatic process instrumentation (cpu, memory etc.)
    .AddProcessInstrumentation()
    //Adding custom Meters provided in .NET 8
    .AddMeter("Microsoft.AspNetCore.Hosting")
    .AddMeter("Microsoft.AspNetCore.Server.Kestrel")
    //Send metrics through OTLP protocol
    .AddOtlpExporter(exporter =>
    {
      exporter.Endpoint = builder.Configuration.GetValue<Uri>("OTEL_EXPORTER_OTLP_ENDPOINT");
      //Use Grpc
      exporter.Protocol = OtlpExportProtocol.Grpc;
      //Batch settings
      exporter.BatchExportProcessorOptions.ScheduledDelayMilliseconds = 100;
      exporter.BatchExportProcessorOptions.MaxExportBatchSize = 20;
      //.. More options is available and includes client factories, headers etc.
    }));

Go

import (
	"context"
	"os"
	"strings"
	"time"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"

	"go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc"
	"go.opentelemetry.io/otel/sdk/metric"
	"go.opentelemetry.io/otel/sdk/resource"
)

func newMeterProvider(res *resource.Resource, ctx context.Context) (*metric.MeterProvider, error) {
	ctx, cancel := context.WithTimeout(ctx, time.Second*5)
	defer cancel()
	endpoint := strings.Replace(os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT"), "http://", "", -1)
	conn, err := grpc.DialContext(ctx, endpoint,
		grpc.WithTransportCredentials(insecure.NewCredentials()),
		grpc.WithBlock(),
	)
	if err != nil {
		return nil, err
	}

	metricExporter, err := otlpmetricgrpc.New(ctx, otlpmetricgrpc.WithGRPCConn(conn))
	if err != nil {
		return nil, err
	}

	meterProvider := metric.NewMeterProvider(
		metric.WithResource(res),
		metric.WithReader(metric.NewPeriodicReader(metricExporter,
			// Default is 1m. Set to 3s for demonstrative purposes.
			metric.WithInterval(3*time.Second))),
	)
	return meterProvider, nil
}

otel.SetMeterProvider(meterProvider)

Python

import os
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry import metrics

endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
reader = PeriodicExportingMetricReader(
    OTLPMetricExporter(endpoint=endpoint, insecure=True)
)
meterProvider = MeterProvider(resource=resource, metric_readers=[reader])
metrics.set_meter_provider(meterProvider)

Typescript

import { NodeSDK } from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { PeriodicExportingMetricReader } from "@opentelemetry/sdk-metrics";
import { OTLPMetricExporter } from "@opentelemetry/exporter-metrics-otlp-grpc";

const sdk = new NodeSDK({
    //Use Resource configured previously
    resource: resource,
    //Setup exporter to an OTLP endpoint
    metricReader: new PeriodicExportingMetricReader({
        exporter: new OTLPMetricExporter({
            url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
        }),
    }),
    //Additional automatic instrumentations
    instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Custom Metrics

There is some measurements that can be configured manually and they are:
Counter that is a value that accumulates over time - it only goes up.
UpDownCounter is a value that can go both up and down.
Gauge Measures a value at a point in time.
Histogram is a client side aggregation of values, for example request latency.

These can be manually created in code and used to read and understand a system. We can think of our example system that has an implementation in the Order.Service that uses a counter for the order count total.

Traces

As we went through in part 1, traces consist of spans. A span is a something that happens in a service. A span is not a one to one relationship in a service. Receiving a Http Request creates a span, but the receiving service can create nested and child spans to demonstrate the flow of operations, if multiple. A Trace is the big picture of a user flow, operation or similar.

The configuring part of the OpenTelemetry is becoming a bit repetitive, because its basically the same as Metrics and Logs. But here is the Traces Configuring.

CSharp

using OpenTelemetry.Resources;
using OpenTelemetry.Exporter;
using OpenTelemetry.Traces;

builder.Services.AddOpenTelemetry()
  .ConfigureResource(resource => resource
    .AddService(serviceName)
    .AddAttributes(resourceAttributes))
  .WithTracing(traces => traces
    // Adding automatic AspNet Core instrumentation (for incoming request etc.)
    .AddAspNetCoreInstrumentation()
    //Enabled HttpClient and GrpcClient Instrumentation
    .AddHttpClientInstrumentation()
    .AddGrpcClientInstrumentation()
    //Export the traces to the console
    .AddConsoleExporter()
    //Send traces through OTLP protocol
    .AddOtlpExporter(exporter =>
    {
      exporter.Endpoint = builder.Configuration.GetValue<Uri>("OTEL_EXPORTER_OTLP_ENDPOINT");
      //Use Grpc
      exporter.Protocol = OtlpExportProtocol.Grpc;
      //Batch settings
      exporter.BatchExportProcessorOptions.ScheduledDelayMilliseconds = 100;
      exporter.BatchExportProcessorOptions.MaxExportBatchSize = 20;
      //.. More options is available and includes client factories, headers etc.
    }))
    //Adding a custom source for a custom implementation for the RabbitMQ pub in .NET
    .AddSource(nameof(RabbitMQEventBus)));

Go

import (
	"context"
	"os"
	"strings"
	"time"

	"google.golang.org/grpc"
	"google.golang.org/grpc/credentials/insecure"

	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
	"go.opentelemetry.io/otel/sdk/trace"
	"go.opentelemetry.io/otel/sdk/resource"
)

func newTraceProvider(res *resource.Resource, ctx context.Context) (*trace.TracerProvider, error) {
	ctx, cancel := context.WithTimeout(ctx, time.Second*5)
	defer cancel()
	endpoint := strings.Replace(os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT"), "http://", "", -1)
	conn, err := grpc.DialContext(ctx, endpoint,
		grpc.WithTransportCredentials(insecure.NewCredentials()),
		grpc.WithBlock(),
	)
	if err != nil {
		return nil, err
	}

	// Set up a trace exporter
	traceExporter, err := otlptracegrpc.New(ctx, otlptracegrpc.WithGRPCConn(conn))
	if err != nil {
		return nil, err
	}

	bsp := trace.NewBatchSpanProcessor(traceExporter)
	tracerProvider := trace.NewTracerProvider(
		trace.WithSampler(trace.AlwaysSample()),
		trace.WithResource(res),
		trace.WithSpanProcessor(bsp),
	)
	return tracerProvider, nil
}
otel.SetTracerProvider(tracerProvider)

Python

import os
from opentelemetry.sdk.traces.export import PeriodicExportingTraceReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPTraceExporter
from opentelemetry.sdk.traces import TraceProvider
from opentelemetry import traces

endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
traceProvider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint=endpoint, insecure=True))
traceProvider.add_span_processor(processor)
trace.set_tracer_provider(traceProvider)
trace.get_tracer_provider().add_span_processor(
    SimpleSpanProcessor(ConsoleSpanExporter())
)

Typescript

import { NodeSDK } from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-traces-otlp-grpc";

const sdk = new NodeSDK({
    //Use Resource configured previously
    resource: resource,
    //Setup trace exporter
    traceExporter: new OTLPTraceExporter({
        url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
    }),
    //Additional automatic instrumentations
    instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

Additional automatic configuration is configured for the Python Product.Service service that uses ready to run instrumentation components for RabbitMQ and Postgres Database. This enables automatic trace recording and extraction from RabbitMQ and Postgres, with very simple setup.

Postgres

from opentelemetry.instrumentation.psycopg2 import Psycopg2Instrumentor

Psycopg2Instrumentor().instrument(enable_commenter=True, commenter_options={})

RabbitMQ

from opentelemetry.instrumentation.pika import PikaInstrumentor

PikaInstrumentor().instrument()

Running my solution

You can find my sample repository for this example and architecture in this GitHub link.

To be able to run the solution we utilize docker.

First we run the PowerShell build-images.ps1 in the root of the GitHub, just to get the images built of all the services locally.

After that we can use the docker-compose.yml in the /docker folder to run the whole solution.

This docker compose will run and expose multiple services according to the table below:

Service Address Description
otel_collector - Collects and processes telemetry
jaeger http://localhost:16686/ Go to Address to get the jaeger ui
seq http://localhost:5341/ Look at logs in Seq
zipkin http://localhost:9411/ Get Zipkin Traces and dependency map view
adminer http://localhost:7070/ Adminer instance just to connect to the different databases, for debug purposes
prometheus http://localhost:9090/ Prometheus to look at metrics
Shop ApiGateway http://localhost:8080/ Api Gateway for the "web shop"
Product Service http://localhost:50051/ GRPC Service for the Product domain
Product Database localhost:5432 Postgres database for products
Payment Service http://localhost:8082/ Http Service that pretends to handle payments
Order Service http://localhost:8081/ Http Service that pretends to handle orders
Order Database localhost:3306 MySql database for orders
Email Service - The Service that pretends to send Emails
Update Receptionist - The Service that pretends to update products prices
RabbitMQ localhost:5672 RabbitMQ instance for pub/sub
RabbitMQ Management http://localhost:15672/ RabbitMQ management ui

Wrapping up

In this article, we have looked into the implementation details of OpenTelemetry. We explored its practical integration across programming languages, showcasing its adaptability in orchestrating observability. From setup to configuration and resource handling, we demonstrated OpenTelemetry's versatility and ease of implementation.

In my adventures investigating the OTEL Collector I've noticed the lack of a SQL Server Log Exporter. I have a idea to get start contribute on such a Exporter. Reach out on my LinkedIn if you are intrested to help me in this contribution

Throughout the article, we covered OpenTelemetry setup across languages, OTEL Collector's role in centralized telemetry, resource configuration for logs, metrics, and traces, seamless integration of logs, metrics, and traces, and manual setup of custom metrics.

As always you can find all the source code at my github

In the upcoming Part 3, we will explore the tools for visualizing telemetry, enhancing visualization and monitoring within the OpenTelemetry ecosystem. Stay tuned for the next part!