In classic architectures, such as monolithic systems, a glance at the system's log file was often enough to recognise whether the system was behaving as expected. If not, it was also possible to determine why the system was not behaving as expected.
A tool that was able to interpret log files accordingly was therefore often sufficient. However, this no longer applies to the cloud-native architectures that are used today. Modern systems and therefore log files are too distributed to be meaningfully contextualised using traditional means.
The Observe stopover takes care of precisely this circumstance and transforms classic monitoring into an observability-first strategy.
What can I expect from this stopover?
The Observe stopover includes the following topics:
Visualisierung & Alerting
Application Performance Monitoring
Infrastructure Monitoring
SIEM
Cost Monitoring
Of course, observability is much more than just looking at log files. Different signals such as traces, metrics or logs must be linked in order to obtain a holistic picture of the situation. The monitoring backend can then automatically recognise and react to exceptional situations.
As a CNCF Silver Member, we naturally base our solution on standards from the CNCF map. OpenTelemetry has established itself as the standard for observability in recent years and has now matured into one of the largest projects within the CNCF.
OpenTelemetry integrates the signals mentioned in the standard and enables them to be correlated with each other. In order to be able to monitor distributed systems end-to-end, it utilises the W3C Trace Context standard and therefore does not reinvent the wheel.
Technical information can also be propagated in a standardised way across several subsystems, which makes it possible, for example, to build service level objectives on such information. OpenTelemetry is based on the W3C Baggage standard.
Thanks to a standardised API and a manufacturer-independent SDK, the telemetry data can be interacted with directly in the application code. This enables an additional level for the aggregation of telemetry data and, of course, completely new possibilities for analysing this data.
Why should I stop at this stopover?
This stopover is essential in order to be able to cover the following points in a cloud-native system sustainably and in the long term:
Introduction of (Near-)Realtime Alerting & AI Ops
Understanding the behaviour of the overall system
Monitoring of security-relevant events
Get an overview of the operating costs
How does this stopover work?
We have designed a process model that makes the introduction of OpenTelemetry as friction-free as possible by iteratively working on topics adapted to customer needs using our proven CRAWL-WALK-RUN method:
The current main contributor to the centrepiece of an observability architecture based on OpenTelemetry - namely the collector - is Splunk. Splunk is probably familiar to many in the field of logging or SIEM, but the Splunk Observability Cloud is now also very well positioned in the application performance monitoring and infrastructure monitoring area.
In the next article, we will therefore take a closer look at this product from our partner, as it offers optimal and native support for OpenTelemetry and also sets new accents in terms of usability, which also fits in perfectly with the DevOps philosophy.