Observability
Observability is the ability to measure a system’s current state based on the data it generates, recorded as logs, metrics, and traces. In fact, logs, metrics, and traces are known as the “three pillars of observability.”
-
Logs: Logs record the details of an event
-
Metrics: Metrics capture the numeric measurements used to quantify the performance and health of services
-
Traces: Traces track how services connect from end to end in response to requests.
-
Sometimes events are also considered as a fourth pillar of observability. Events are discrete actions that occur in a system at any point in time and are distinct from logs.
Monitoring vs observability
-
Monitoring and obersevability are related but different concepts.
-
Monitoring is the process of collecting data and generating reports on different metrics that define system health. Think alerting.
-
Observability brings a wider scope and visibility to traditional monitoring tools, incorporating extra situational and historical data and system interactions. Think investigation.
-
Monitoring is the when and what of a system error, and observability is the why and how.
-
When thinking about complex distributed systems, observability can help get to the root cause of a problem by providing a holistic view of the system, whereas monitoring can only tell you something is wrong on a particular system.
OpenMetrics
OpenMetrics is a project that emerged from the Prometheus project to create a standard for metrics exposition. It is a specification for the exposition of metrics from systems in a consistent, human-readable format.
Systems expose metrics on a /metrics endpoint, which can then be collected by
software such as Prometheus.
OpenMetrics just deals with metrics and no other observability data.
OpenTelemetry
OpenTelemetry is a set of standards, API, SDKs, and libraries that aim to standardize the generation, collection, and management of telemetry data (logs, metrics, and traces)
OpenTelemetry provides its own set of client libraries which makes it easier for you to integrate it into your application code.
The OpenTelemetry project calls logs, metrics, and traces signals. OpenTelemetry also defines a fourth signal called baggage which is used to to pass data between services.
AWS have an OpenTelemetry distribution called AWS Distro for OpenTelemetry (ADOT). ADOT can make it easy to implement OpenTelemetry in your AWS environment.
OpenTelemetry is a CNCF project.