An instrumentation library’s primary aim is to construct a model that serves observability and controllability, including application monitoring and management and is exceptionally effective in doing so. The following proposed OpenSignals service level management grid model offers a radical simplification to the current and largely failing approach to dashboard design consisting of the arbitrary placement of 10s to 100s of metrics on a single pane of screen real estate.
The service level management model is easily supported by any backend that processes data collected from an OpenSignals implementation through a supported plugin interface. In the model, a service within a network of services is visually represented as a 5×3 grid consisting of rows for each operational status and columns for each service viewpoint within a flow of execution.
The three columns represent subjective perspectives within a network that can be taken of a single service. The first column represents those services that make use of the service – its dependents. The second column represents the service itself. The third column represents services the service itself relies on for its execution – its dependencies.
Measures of Quality
Several options can be applied in deciding what to list in cell grids; two possibilities are the count of services or the number of services instances. The important thing is that the values listed in the grid reflect the nature of execution flow and, more importantly, guide a systems engineer or developer in determining the quality of the service from multiple points of view within the network of communicating and cooperating services. The same grid needs to be able to scale up to clusters of services and the overall system.
The ability to group and aggregate an observation model into coarser monitoring and management boundaries is paramount to any effort in managing increasing complexity and rates of change. Unlike other observability approaches like metrics, traces, and logs, the OpenSignals model of service level model makes it not only possible but practical to do so.
The ingress and egress columns are relational so that the values can reflect either end of the service-to-service interaction. The first column could list the number of calling services that have judged (inferred) the service in question to be operating at a particular status. Alternatively, the same column could be a judgment of the callers by the service when such context is transferred, and requests are rejected. Another possibility is to list the number of entry points, nested services, that the service exposes to clients.
It is crucial to keep in mind that there is always the ongoing process of inferring the status of each other in any service-to-service interaction; this is where the simplification of OpenSignals as opposed to the unnecessary complexity of distributed tracing shines a light on what is significant at the heart of service communication and control. The detail of deep traces does not make the job better; it just adds to a growing data fog.
For OpenSignals, the focus is on the immediate connections and the sensitivity (to errors) that can be exhibited or defined in execution. Many of the signals included in OpenSignals reflect the collection and cognition requirements in engineering systems of resilience.