OpenTelemetry Part 2 - Implementation
Introduction
In our last discussion, we explored the basics of observability, breaking down the crucial components: Logs, Metrics, and Traces. Additionally, we highlighted the role played by the OpenTelemetry collector in orchestrating a comprehensive system visibility strategy.
Now, let's shift to the practical side. Imagine this as a hands-on experience through the details of code. We're about to see how OpenTelemetry seamlessly integrates into various programming languages, navigates through different communication protocols, and synchronizes with diverse messaging systems.
My focus is not going to be a fully-fledged implementation. It's going to be an example to implement the framework in multiple programming languages, communication protocols, and messaging for demonstration purposes. Aiming to showcase how OpenTelemetry dynamically adapts and demonstrates its versatility across different coding environments
High-Level Architecture
In our pursuit of a webshop implementation, we strive for simplicity. The initiator of our architecture is the ApiGateway, serving as a BFF (Backend For Frontend) for potential frontend application (not implemented here). This gateway takes charge, coordinating and orchestrating simulated order events. Supporting this orchestration are multiple backend APIs, worker services, databases, and the RabbitMQ instance for eventing and messaging.
The ApiGateway:S role is to provide one single point of contact for the frontend, and to orchestrate to flow of events with backend APIs and the RabbitMQ instance. Each Backend API delivers domain-specific functionality, with the Order and Product services having their own dedicated databases. Meanwhile, the Email Service keeps an ear out for Notification Events on the RabbitMQ instance, promptly sending emails when triggered. Adding a touch of realism, the UpdateReceptionist simulates the sporadic updating of product prices.
For the sake of demonstration, I've implemented this system in various languages (.Net C#, Go, Python) and protocols (HTTP, gRPC), utilizing two different database providers (Postgres, MySQL).
The OpenTelemetry framework is implemented with the SDK, and not in any automatic way. I prefer the SDK for the level of control and customizability. OpenTelemetry has very good support to monitor an application without touching the code, which I imagine is very powerful for a legacy production system that developers do not prefer to touch or spend time developing in. Maybe that would be a separate topic in the future.
OpenTelemetry Setup
Setting up the OpenTelemetry SDK is remarkably straightforward. For each programming language I've reviewed the documentation suggest implementing and configuring the framework during application startup.
In each language, we employ hooks into the program and leverage default components to set up Logs, Metrics, and Traces. Typically, the initial step involves initializing the Resource
, an immutable entity responsible for producing artifacts such as Logs, Metrics, and Traces. During this initialization, we can establish default attributes, service names, versions, and more.
Prior to configuring the SDK, it's essential to ensure that the environment has the necessary SDK dependencies for the respective language. Always refer to the OpenTelemetry guidelines and import the required dependencies for the active language.
OTEL Collector
For this setup we are leveraging the OTEL Collector as a single point of telemetry ingestion. The Collector have its own configuration that specifies how and where the Telemetry should be sent to.
The following configuration is applied to this application.
Receivers:
# The receivers are the section that allows the OpenTelemetry Collector to receive data from various sources using Push or Pull model.
# The receiver name must match the name of the receiver in the service/pipelines/traces/receivers section.
receivers:
otlp:
protocols:
grpc:
http:
rabbitmq:
endpoint: http://rabbitmq:15672
username: myuser
password: password
collection_interval: 10s
tls:
insecure: true
Exporters:
# The exporters are the section that allows the OpenTelemetry Collector to export data to various sources.
# The exporter name must match the name of the exporter in the service/pipelines/traces/exporters section.
exporters:
# exports zipkin format spans to the zipkin instance
zipkin:
endpoint: "http://zipkin:9411/api/v2/spans"
tls:
insecure: true
# exports logs to the seq instance
otlphttp/seq:
endpoint: "http://seq:5341/ingest/otlp"
tls:
insecure: true
# Setting logging level to debug (possibilities to filter log level in a central place)
debug:
verbosity: detailed
# Exporter for the metrics endpoint for the prometheus instance
prometheus:
endpoint: "0.0.0.0:8889"
# Exporter for the jeager instance
otlp:
endpoint: jaeger:4317
tls:
insecure: true
Service:
# The service section is the top-level section that contains the configuration for the OpenTelemetry Collector service.
# The pipelines section contains the configuration for the data pipelines that the OpenTelemetry Collector will run.
# The name of the pipeline must match the name of the pipeline in the service/pipelines section.
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [debug, zipkin, spanmetrics, otlp]
metrics:
receivers: [otlp, rabbitmq]
processors: [batch]
exporters: [debug, prometheus]
logs:
receivers: [otlp]
processors: []
exporters: [debug, otlphttp/seq]
metrics/spanmetrics:
receivers: [spanmetrics]
exporters: [prometheus]
Full Configuration
# The receivers are the section that allows the OpenTelemetry Collector to receive data from various sources using Push or Pull model.
# The receiver name must match the name of the receiver in the service/pipelines/traces/receivers section.
receivers:
otlp:
protocols:
grpc:
http:
rabbitmq:
endpoint: http://rabbitmq:15672
username: myuser
password: password
collection_interval: 10s
tls:
insecure: true
# The exporters are the section that allows the OpenTelemetry Collector to export data to various sources.
# The exporter name must match the name of the exporter in the service/pipelines/traces/exporters section.
exporters:
# exports zipkin format spans to the zipkin instance
zipkin:
endpoint: "http://zipkin:9411/api/v2/spans"
tls:
insecure: true
# exports logs to the seq instance
otlphttp/seq:
endpoint: "http://seq:5341/ingest/otlp"
tls:
insecure: true
# Setting logging level to debug (possibilities to filter log level in a central place)
debug:
verbosity: detailed
# Exporter for the metrics endpoint for the prometheus instance
prometheus:
endpoint: "0.0.0.0:8889"
# Exporter for the jeager instance
otlp:
endpoint: jaeger:4317
tls:
insecure: true
connectors:
spanmetrics:
processors:
batch:
# The service section is the top-level section that contains the configuration for the OpenTelemetry Collector service.
# The pipelines section contains the configuration for the data pipelines that the OpenTelemetry Collector will run.
# The name of the pipeline must match the name of the pipeline in the service/pipelines section.
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [debug, zipkin, spanmetrics, otlp]
metrics:
receivers: [otlp, rabbitmq]
processors: [batch]
exporters: [debug, prometheus]
logs:
receivers: [otlp]
processors: []
exporters: [debug, otlphttp/seq]
metrics/spanmetrics:
receivers: [spanmetrics]
exporters: [prometheus]
Resource Setup
We utilize the Resource SDK to configure the resource. The resource is later used as input to wiring up the Logs, Metrics and Traces.
CSharp
using OpenTelemetry.Resources;
var serviceName = "service.name";
var resourceAttributes = new KeyValuePair<string, object>[]
{
new("service.version", "0.0.1-alpha")
};
var resource = ResourceBuilder.CreateDefault()
.AddService(serviceName)
.AddAttributes(resourceAttributes);
Go
import (
"go.opentelemetry.io/otel/sdk/resource"
semconv "go.opentelemetry.io/otel/semconv/v1.21.0"
)
serviceName := "service.name"
serviceVersion := "0.0.1-alpha"
otel_resource := resource.Merge(resource.Default(),
resource.NewWithAttributes(semconv.SchemaURL,
semconv.ServiceName(serviceName),
semconv.ServiceVersion(serviceVersion),
))
Python
from opentelemetry.sdk.resources import SERVICE_NAME, SERVICE_VERSION, Resource
resource = Resource(attributes={
SERVICE_NAME: "service.name",
SERVICE_VERSION: "0.0.1-alpha"
})
Typescript
import { Resource } from "@opentelemetry/resources";
import { SemanticResourceAttributes } from "@opentelemetry/semantic-conventions";
const resource = new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: "service.name",
[SemanticResourceAttributes.SERVICE_VERSION]: "0.0.1-alpha",
});
Logs
When we are wiring up Logs we have multiple possibilities and configuring options. We can output logs to various exporters (as many as we want). We can also configure things as batching for each configured service.
CSharp
using OpenTelemetry.Logs;
using OpenTelemetry.Resources;
builder.Logging.AddOpenTelemetry(options =>
{
options
//The resource variable is the Resource we setup in previously
.SetResourceBuilder(resource)
//Output all logs to the Console
.AddConsoleExporter()
//Send all the logs with OTLP protocol
.AddOtlpExporter(exporter =>
{
//Set endpoint to send the logs to
exporter.Endpoint = builder.Configuration.GetValue<Uri>("OTEL_EXPORTER_OTLP_ENDPOINT");
//Use Grpc protocol
exporter.Protocol = OtlpExportProtocol.Grpc;
//Configure batching
exporter.BatchExportProcessorOptions.ScheduledDelayMilliseconds = 100;
exporter.BatchExportProcessorOptions.MaxExportBatchSize = 20;
//.. More options is available and includes client factories, headers etc.
});
});
Go
Currently in Development for OpenTelemetry
Python
# Logs in Python is Currently in Expiremental for OpenTelemetry
import os
import logging
from opentelemetry.sdk.resources import Resource
from opentelemetry._logs import set_logger_provider
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import (
OTLPLogExporter
)
from opentelemetry.sdk._logs.export import (
BatchLogRecordProcessor,
SimpleLogRecordProcessor,
ConsoleLogExporter
)
endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
# resource previously configured
logger_provider = LoggerProvider(
resource=resource
)
# set global logger
set_logger_provider(logger_provider)
# create OTLP Log Exporter with endpoint
exporter = OTLPLogExporter(endpoint=endpoint, insecure=True)
# Setting up log processors, they are called in the same order they are setup.
# Batch OTLP Logging => Output Logs to Console
logger_provider.add_log_record_processor(BatchLogRecordProcessor(exporter))
logger_provider.add_log_record_processor(
SimpleLogRecordProcessor(ConsoleLogExporter()))
# setting Logging level to DEBUG (for demonstration purposes)
handler = LoggingHandler(level=logging.DEBUG, logger_provider=logger_provider)
# Attach OTLP handler to root logger
logging.getLogger().addHandler(handler)
Typescript
Currently in Development for OpenTelemetry
Metrics
Metrics are numerical data points that offer insights into the performance and health of a system. They include measurements of resource utilization, response times, error rates, throughput, and custom domain-specific metrics. Metrics play a crucial role in monitoring system health, identifying trends, and establishing performance baselines. Examples include CPU usage, request rates, and memory utilization.
In my experience, implementing Metrics has generally been straightforward, although there are instances where "automatic metrics" are lacking. By automatic metrics, I mean that as a developer, I don't want to engage in the tedious task of implementing response time and counting for every single endpoint. They are some instrumentation libraries for different protocols, frameworks etc. This challenge could stem from my limited experience with some of the languages I've worked with; feel free to reach out and teach me if you have insights.
Starting from .NET 8 and newer, Metrics integration is built directly into the process. It automatically measures resources and basic metrics of an application. When activated, this information is reported to the configured endpoint or could be directly published using the Prometheus-styled protocol for the Prometheus instance to scrape.
CSharp
using OpenTelemetry.Resources;
using OpenTelemetry.Exporter;
using OpenTelemetry.Metrics;
builder.Services.AddOpenTelemetry()
.ConfigureResource(resource => resource
.AddService(serviceName)
.AddAttributes(resourceAttributes))
.WithMetrics(metrics => metrics
// Adding automatic AspNet Core instrumentation (for incoming request etc.)
.AddAspNetCoreInstrumentation()
//Adding automatic runtime instrumentation
.AddRuntimeInstrumentation()
//Adding automatic process instrumentation (cpu, memory etc.)
.AddProcessInstrumentation()
//Adding custom Meters provided in .NET 8
.AddMeter("Microsoft.AspNetCore.Hosting")
.AddMeter("Microsoft.AspNetCore.Server.Kestrel")
//Send metrics through OTLP protocol
.AddOtlpExporter(exporter =>
{
exporter.Endpoint = builder.Configuration.GetValue<Uri>("OTEL_EXPORTER_OTLP_ENDPOINT");
//Use Grpc
exporter.Protocol = OtlpExportProtocol.Grpc;
//Batch settings
exporter.BatchExportProcessorOptions.ScheduledDelayMilliseconds = 100;
exporter.BatchExportProcessorOptions.MaxExportBatchSize = 20;
//.. More options is available and includes client factories, headers etc.
}));
Go
import (
"context"
"os"
"strings"
"time"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
"go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc"
"go.opentelemetry.io/otel/sdk/metric"
"go.opentelemetry.io/otel/sdk/resource"
)
func newMeterProvider(res *resource.Resource, ctx context.Context) (*metric.MeterProvider, error) {
ctx, cancel := context.WithTimeout(ctx, time.Second*5)
defer cancel()
endpoint := strings.Replace(os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT"), "http://", "", -1)
conn, err := grpc.DialContext(ctx, endpoint,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithBlock(),
)
if err != nil {
return nil, err
}
metricExporter, err := otlpmetricgrpc.New(ctx, otlpmetricgrpc.WithGRPCConn(conn))
if err != nil {
return nil, err
}
meterProvider := metric.NewMeterProvider(
metric.WithResource(res),
metric.WithReader(metric.NewPeriodicReader(metricExporter,
// Default is 1m. Set to 3s for demonstrative purposes.
metric.WithInterval(3*time.Second))),
)
return meterProvider, nil
}
otel.SetMeterProvider(meterProvider)
Python
import os
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry import metrics
endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
reader = PeriodicExportingMetricReader(
OTLPMetricExporter(endpoint=endpoint, insecure=True)
)
meterProvider = MeterProvider(resource=resource, metric_readers=[reader])
metrics.set_meter_provider(meterProvider)
Typescript
import { NodeSDK } from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { PeriodicExportingMetricReader } from "@opentelemetry/sdk-metrics";
import { OTLPMetricExporter } from "@opentelemetry/exporter-metrics-otlp-grpc";
const sdk = new NodeSDK({
//Use Resource configured previously
resource: resource,
//Setup exporter to an OTLP endpoint
metricReader: new PeriodicExportingMetricReader({
exporter: new OTLPMetricExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
}),
}),
//Additional automatic instrumentations
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
Custom Metrics
There is some measurements that can be configured manually and they are:
Counter that is a value that accumulates over time - it only goes up.
UpDownCounter is a value that can go both up and down.
Gauge Measures a value at a point in time.
Histogram is a client side aggregation of values, for example request latency.
These can be manually created in code and used to read and understand a system. We can think of our example system that has an implementation in the Order.Service that uses a counter for the order count total.
Traces
As we went through in part 1, traces consist of spans. A span is a something that happens in a service. A span is not a one to one relationship in a service. Receiving a Http Request creates a span, but the receiving service can create nested and child spans to demonstrate the flow of operations, if multiple. A Trace is the big picture of a user flow, operation or similar.
The configuring part of the OpenTelemetry is becoming a bit repetitive, because its basically the same as Metrics and Logs. But here is the Traces Configuring.
CSharp
using OpenTelemetry.Resources;
using OpenTelemetry.Exporter;
using OpenTelemetry.Traces;
builder.Services.AddOpenTelemetry()
.ConfigureResource(resource => resource
.AddService(serviceName)
.AddAttributes(resourceAttributes))
.WithTracing(traces => traces
// Adding automatic AspNet Core instrumentation (for incoming request etc.)
.AddAspNetCoreInstrumentation()
//Enabled HttpClient and GrpcClient Instrumentation
.AddHttpClientInstrumentation()
.AddGrpcClientInstrumentation()
//Export the traces to the console
.AddConsoleExporter()
//Send traces through OTLP protocol
.AddOtlpExporter(exporter =>
{
exporter.Endpoint = builder.Configuration.GetValue<Uri>("OTEL_EXPORTER_OTLP_ENDPOINT");
//Use Grpc
exporter.Protocol = OtlpExportProtocol.Grpc;
//Batch settings
exporter.BatchExportProcessorOptions.ScheduledDelayMilliseconds = 100;
exporter.BatchExportProcessorOptions.MaxExportBatchSize = 20;
//.. More options is available and includes client factories, headers etc.
}))
//Adding a custom source for a custom implementation for the RabbitMQ pub in .NET
.AddSource(nameof(RabbitMQEventBus)));
Go
import (
"context"
"os"
"strings"
"time"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc"
"go.opentelemetry.io/otel/sdk/trace"
"go.opentelemetry.io/otel/sdk/resource"
)
func newTraceProvider(res *resource.Resource, ctx context.Context) (*trace.TracerProvider, error) {
ctx, cancel := context.WithTimeout(ctx, time.Second*5)
defer cancel()
endpoint := strings.Replace(os.Getenv("OTEL_EXPORTER_OTLP_ENDPOINT"), "http://", "", -1)
conn, err := grpc.DialContext(ctx, endpoint,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithBlock(),
)
if err != nil {
return nil, err
}
// Set up a trace exporter
traceExporter, err := otlptracegrpc.New(ctx, otlptracegrpc.WithGRPCConn(conn))
if err != nil {
return nil, err
}
bsp := trace.NewBatchSpanProcessor(traceExporter)
tracerProvider := trace.NewTracerProvider(
trace.WithSampler(trace.AlwaysSample()),
trace.WithResource(res),
trace.WithSpanProcessor(bsp),
)
return tracerProvider, nil
}
otel.SetTracerProvider(tracerProvider)
Python
import os
from opentelemetry.sdk.traces.export import PeriodicExportingTraceReader
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPTraceExporter
from opentelemetry.sdk.traces import TraceProvider
from opentelemetry import traces
endpoint = os.getenv("OTEL_EXPORTER_OTLP_ENDPOINT")
traceProvider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint=endpoint, insecure=True))
traceProvider.add_span_processor(processor)
trace.set_tracer_provider(traceProvider)
trace.get_tracer_provider().add_span_processor(
SimpleSpanProcessor(ConsoleSpanExporter())
)
Typescript
import { NodeSDK } from "@opentelemetry/sdk-node";
import { getNodeAutoInstrumentations } from "@opentelemetry/auto-instrumentations-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-traces-otlp-grpc";
const sdk = new NodeSDK({
//Use Resource configured previously
resource: resource,
//Setup trace exporter
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT,
}),
//Additional automatic instrumentations
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
Additional automatic configuration is configured for the Python Product.Service service that uses ready to run instrumentation components for RabbitMQ and Postgres Database. This enables automatic trace recording and extraction from RabbitMQ and Postgres, with very simple setup.
Postgres
from opentelemetry.instrumentation.psycopg2 import Psycopg2Instrumentor
Psycopg2Instrumentor().instrument(enable_commenter=True, commenter_options={})
RabbitMQ
from opentelemetry.instrumentation.pika import PikaInstrumentor
PikaInstrumentor().instrument()
Running my solution
You can find my sample repository for this example and architecture in this GitHub link.
To be able to run the solution we utilize docker.
First we run the PowerShell build-images.ps1
in the root of the GitHub, just to get the images built of all the services locally.
After that we can use the docker-compose.yml
in the /docker
folder to run the whole solution.
This docker compose will run and expose multiple services according to the table below:
Service | Address | Description |
---|---|---|
otel_collector | - | Collects and processes telemetry |
jaeger | http://localhost:16686/ | Go to Address to get the jaeger ui |
seq | http://localhost:5341/ | Look at logs in Seq |
zipkin | http://localhost:9411/ | Get Zipkin Traces and dependency map view |
adminer | http://localhost:7070/ | Adminer instance just to connect to the different databases, for debug purposes |
prometheus | http://localhost:9090/ | Prometheus to look at metrics |
Shop ApiGateway | http://localhost:8080/ | Api Gateway for the "web shop" |
Product Service | http://localhost:50051/ | GRPC Service for the Product domain |
Product Database | localhost:5432 | Postgres database for products |
Payment Service | http://localhost:8082/ | Http Service that pretends to handle payments |
Order Service | http://localhost:8081/ | Http Service that pretends to handle orders |
Order Database | localhost:3306 | MySql database for orders |
Email Service | - | The Service that pretends to send Emails |
Update Receptionist | - | The Service that pretends to update products prices |
RabbitMQ | localhost:5672 | RabbitMQ instance for pub/sub |
RabbitMQ Management | http://localhost:15672/ | RabbitMQ management ui |
Wrapping up
In this article, we have looked into the implementation details of OpenTelemetry. We explored its practical integration across programming languages, showcasing its adaptability in orchestrating observability. From setup to configuration and resource handling, we demonstrated OpenTelemetry's versatility and ease of implementation.
In my adventures investigating the OTEL Collector I've noticed the lack of a SQL Server Log Exporter. I have a idea to get start contribute on such a Exporter. Reach out on my LinkedIn if you are intrested to help me in this contribution
Throughout the article, we covered OpenTelemetry setup across languages, OTEL Collector's role in centralized telemetry, resource configuration for logs, metrics, and traces, seamless integration of logs, metrics, and traces, and manual setup of custom metrics.
As always you can find all the source code at my github
In the upcoming Part 3, we will explore the tools for visualizing telemetry, enhancing visualization and monitoring within the OpenTelemetry ecosystem. Stay tuned for the next part!