Zero Trust OpenTelemetry with SPIFFE Workload Identity

Published

02 April 2026

Radosław Piliszek

SREs all around the world know that a proper telemetry setup is fundamental to achieving observability, which in turn enables proactive system management capabilities and responsive remediation of issues. Telemetry is the automated and continuous process of collecting and (re)transmitting metrics, logs and traces. It is especially important in modern, complex, distributed systems.

But there's a tension at the heart of most telemetry deployments. We invest heavily in standardizing how observability data is collected and represented, yet the question of how telemetry components authenticate to each other is often left as an afterthought — or solved with ad hoc, fragile mechanisms that don't scale.

OpenTelemetry (OTel) has brought remarkable progress on the data side. It's time for the authentication story to catch up. In this post, we'll look at the current state of OTel authentication, examine why the common approaches fall short, and make the case that cryptographically verifiable workload identity — specifically SPIFFE — is the foundation the ecosystem needs.

Many ways to telemetry and the rise of OpenTelemetry

Historically, monitoring solutions were highly fragmented, leading to a plethora of incompatible, proprietary implementations that struggle to scale in modern, highly distributed systems. Thankfully, some de facto standardization has taken place in recent years in the form of OpenTelemetry (OTel). It focuses on the observability data collection and offers developers and SREs a coherent framework for generating and transferring that data. The key specification is the OpenTelemetry Protocol (OTLP) that standardizes how the observability data is sent and received.

A typical OpenTelemetry architecture consists of the following:

Observed software components commonly utilizing OTel SDKs or Instrumentation Libraries — handled by Collector's receivers
Collector: the central component, possibly replicated, that receives (or fetches) the data and is able to push it to external data stores
Observability data receivers: also referred to as destinations, backends or data stores — handled by Collector's exporters

OpenTelemetry Collector bridges observability sources with destinations

How OTel authentication works today

The OpenTelemetry Collector is flexible in how it handles authentication. Several approaches are available, ranging from no authentication at all to mutual TLS. In practice, most deployments land somewhere in the middle — and each choice carries real security tradeoffs.

No authentication (plaintext OTLP)

Out of the box, the Collector accepts OTLP over gRPC and HTTP without any authentication or encryption. This is the default development experience, and it's not uncommon to find it in production. The OTLP exporter configuration makes it straightforward to set insecure: true and move on. The risk is obvious: anyone with network access can inject fabricated telemetry or exfiltrate the data in transit. In environments where observability data drives automated alerting and scaling decisions, injected data can directly cause production incidents.

Static API keys and bearer tokens

The most common first step toward securing telemetry channels is adding an API key or bearer token to the Authorization header. The Collector supports this via its headers configuration on exporters and through authentication extensions such as bearertokenauth.

This is a meaningful improvement over plaintext, but we have all the limitations of static tokens:

Secret sprawl: Every service sending telemetry needs a copy of the key, multiplying the attack surface.
Rotation is painful: Changing a shared key requires coordinated redeployment across all producers and consumers.
No identity granularity: A leaked key grants full access. There's no way to distinguish which workload is sending data or to revoke access for a single compromised service.
Keys are long-lived by default: Without dedicated infrastructure for rotation, keys tend to persist far longer than they should.

Server-side TLS

Enabling TLS on the Collector encrypts the channel and lets clients verify the Collector's identity. This protects against passive eavesdropping and man-in-the-middle attacks on the transport layer. The Collector's TLS configuration supports specifying certificate and key files.

However, server-side TLS alone is a half-measure. The Collector cannot verify who is sending data — any client that trusts the CA can connect. This means telemetry injection from unauthorized sources remains possible. Many teams stop here because the channel "feels encrypted," but the authentication gap is significant.

Mutual TLS with manually managed certificates

mTLS closes the authentication gap by requiring both sides to present certificates. The Collector supports this through the client_ca_file setting, enabling it to verify client certificates against a trusted CA.

This is architecturally sound — both parties are cryptographically authenticated. The problem is operational. Certificate provisioning, distribution, rotation, and revocation must all be managed explicitly. In environments with dozens or hundreds of services producing telemetry, the manual overhead becomes unsustainable. Teams end up with long-lived certificates, inconsistent rotation schedules, and fragile automation scripts.

Summary of tradeoffs

Method	Encrypts in Transit	Authenticates Server	Authenticates Client	Scales without Manual Rotation	Cryptographic Identity
Plaintext OTLP	✗	✗	✗	N/A	✗
Static API Keys (plain HTTP)	✗	✗	Weak (shared secret)	✗	✗
Static API Keys + TLS	✓	✓	Weak (shared secret)	✗	Server only
Server-side TLS	✓	✓	✗	✗	Server only
Manual mTLS	✓	✓	✓	✗	✓ (but brittle)
mTLS with SPIFFE	✓	✓	✓	✓	✓

Each step improves the security posture, but every approach short of automated, identity-backed mTLS leaves either an authentication gap or an operational scaling problem. This is exactly the space that SPIFFE was designed for.

Zero Trust telemetry

In the context of telemetry, Zero Trust means every connection between telemetry components must be cryptographically authenticated and verified, regardless of network position.

The security stakes are twofold:

Reconnaissance: Observability data is rich with information about system architecture, traffic patterns, error rates, and internal service names. Unprotected telemetry channels are a gift to attackers performing reconnaissance.
Injection: When automated decisions — scaling, alerting, incident response — are driven by observability data, injecting false telemetry can directly destabilize production systems.

As we saw in the authentication landscape above, the commonly deployed mitigations (shared API keys, server-only TLS, network-based trust) each leave gaps. With the Non-Human Identity (NHI) crisis, we know that only cryptographically verifiable workload identity provides a foundation for secure channel authentication at scale. Shared secrets and out-of-band trust assumptions do not cut it.

Workload Identity for OTel Collector

With the typical OTel architecture, there is a single component, namely the Collector, that sits in the center of the architecture and handles both inbound and outbound telemetry traffic. This puts it in the role of both a client and a server in the most common view of communication patterns. It needs to ensure authenticity of inbound data as well as authenticate itself to external systems. This is a perfect fit for SPIFFE-based identity to be applied.

The Collector has to receive its own identity (SPIFFE ID) and a proof it has it -- the SVID (SPIFFE Verifiable Identity Document) in SPIFFE parlance. Additionally, with the help of SPIFFE's Workload API, it will be able to obtain the means to verify the identities of incoming connections. Analogously, the software sending the observability data will require the same to validate authenticity and self-authenticate in the communication with the Collector. This is most commonly done with mTLS as this is what is supported by both SPIFFE and the Collector.

There is only one caveat -- OTel Collector is not SPIFFE-native, meaning it is not aware of SPIFFE identities out of the box. Furthermore, the applications sending the data might not be SPIFFE-native either. Fortunately, the SPIFFE community has built spiffe-helper. It bridges the gap by writing X.509 SVIDs to the filesystem. The Collector (or any application that needs them) can then read them as regular X.509 certificates. Moreover, spiffe-helper supports automatic rotation and can signal another process it has finished rotating the on-filesystem SVIDs. As the manual application of spiffe-helper does not scale, our team at Cofide has built open-source tooling that enhances the SREs' experience.

Cofide open source

Cofide maintains open source tooling that makes obtaining and using workload identity credentials straightforward for application teams.

‍cofide-sdk-go lets you easily add first-class SPIFFE support to Go applications — see the SDK documentation to get started.
‍spiffe-enable is a Kubernetes admission webhook that auto-injects the necessary sidecars and configuration into Kubernetes pods, allowing applications to securely receive and use SVIDs without requiring code changes. spiffe-enable is especially useful for onboarding workloads to Connect that are not natively SPIFFE-aware. Read more in the spiffe-enable documentation and the relevant blog post.

What about the LGTM stack with Alloy?

As OpenTelemetry's goals include vendor-neutrality and minimum surface (with just the protocol and the proposed Collector), the compatible open-source ecosystem has grown to offer solutions such as the LGTM stack by Grafana. The LGTM letters stand for, respectively, Loki, Grafana, Tempo and Mimir. Their names also correspond directly to the scope of telemetry they cater for: Logs, Graphical visualization, Traces and Metrics. For longer-term retention, it is often paired with an object storage (usually S3-compatible).

As the LGTM suite is comprehensive and caters to needs of many telemetry deployments, it has been widely deployed. The Collector's role is typically filled by Alloy -- OpenTelemetry-native, with custom optimizations for the entire stack, bringing the best of both worlds. Sadly, Alloy is also not SPIFFE-native. However, the same tooling applicable to Collector works here.

Cofide Connect platform for LGTM

With the LGTM stack caring beyond just the collection and forwarding of observability data, the question is whether more can be done for its Workload Identity posture. Indeed, with the Cofide Connect platform we offer Federated Services which make it easy to seamlessly decide on the level of (de)centralization of various LGTM components.

Getting a typical in-Kubernetes Loki deployment exposed in a secure way is as easy as applying the following manifest:

apiVersion: registry.cofide.io/v1alpha1
kind: FederatedService
metadata:
  name: loki
  namespace: production
spec:
  name: loki
  namespace: production
  exportedTrustDomains:
    - loki-spoke.example.com
  workloadLabels:
    app.kubernetes.io/component: single-binary
  port: 3100

NOTE: The network path programming currently relies on Istio being set up.

A call to the observability ecosystem

OpenTelemetry has achieved something remarkable: a vendor-neutral standard for observability data that the entire industry has rallied around. OTLP is a shared language, and the Collector is a shared building block. This standardization has freed practitioners from lock-in and given them composable, interoperable tooling.

But there is a missing piece. While the data plane has been standardized, the authentication plane remains fragmented. Each deployment cobbles together its own approach — static tokens here, manually managed certificates there, network policies as a stand-in for identity elsewhere. This is the same kind of proprietary, ad hoc landscape that OTel was created to replace on the data side.

SPIFFE offers the authentication counterpart to what OTLP provides for data: a vendor-neutral, CNCF-backed standard that gives every workload a cryptographically verifiable identity. Both projects share a philosophy — open standards, no lock-in, interoperability by default.

Today, neither the OpenTelemetry Collector nor Grafana Alloy natively supports the SPIFFE Workload API. Our tooling (eg spiffe-enable), together with the broader Cofide Connect platform, proves the model works by bridging this gap. Applications and collectors can use SPIFFE-issued identities today without code changes.

But bridging should be a transitional state, not the permanent architecture. First-class SPIFFE support in the Collector itself — the ability to call the Workload API directly, receive SVIDs, and validate peer identities without sidecars or filesystem indirection — would make Zero Trust telemetry a native capability rather than a bolted-on layer.

We're committed to contributing to this effort — through our open-source tooling, through upstream contributions, and through the work we do with the SPIFFE community. If you're interested in collaborating on native SPIFFE support for OTel components, get in touch.

Get started with Zero Trust OpenTelemetry

The observability ecosystem has standardized how we collect and represent telemetry data. The next frontier is standardizing how telemetry components prove their identity to each other — moving from shared secrets and network assumptions to cryptographically verifiable workload identity. If you want to explore hands-on first, the documentation covers core concepts and the open-source projects show what we've built in the open.

If you deploy OpenTelemetry, with or without the LGTM stack, and want to move to modern, standard-based authentication on a foundation that scales, get in touch. We're happy to talk through how Connect fits your setup, and how we can push the ecosystem forward together.

‍

Ready to connect?

Our team is ready to walk you through how Cofide can help you to securely connect workloads with confidence.

Speak to the team