Service Level Management Grid

The primary aim of an instrumentation library is the construction of a model that serves observability and controllability, which includes application monitoring and management, and importantly is extremely effective in doing so. The above proposed OpenSignals service level management grid model offers a radical simplification to the current and largely failing approach to dashboard design consisting of the arbitrary placement of 10s to 100s of metrics on a single pane of screen real estate.

The service level management model is easily supported by any backend that processes data collected from an OpenSignals implementation by way of a supported plugin interface. In the model, a service within a network of services is visually represented as a 5×3 grid consisting of rows for each operational status and columns for each viewpoint of the service within a flow of execution.

The three columns represent each of the subjective perspectives within a network that can be taken of a single service. The first column represents those services that make use of the service – its dependents. The second column represents the service itself. The third column represents services that the service itself relies on for its own execution – its dependencies.

Several options can be applied in deciding what to list in cell grids; two possibilities are the count of services or number of services instances. The important thing is that the values listed in the grid reflect the nature of execution flow and, more importantly, guide a systems engineer or developer in determining the quality of the service from multiple points of view within the network of communicating and cooperating services. The same grid needs to be able to scale up to clusters of services and the overall system. The ability to group and aggregate an observation model into coarser monitoring and management boundaries is paramount to any effort in managing increasing complexity and rates of change. Unlike other observability approaches like metrics, traces, and logs, the OpenSignals model of service level model makes it not only possible but practical to do so.

Both the ingress and egress columns are relational so it is possible for the values to reflect either end of the service-to-service interaction. The first column could list the number of calling services that have judged (inferred) the service in question to be operating at a particular status. Alternatively, the same column could be a judgment of the callers by the service when such context is transferred and requests are rejected. Another possibility is to list the number of entry points, nested services, that the service exposes to clients.

It is important to keep in mind that in any service-to-service interaction there is always the ongoing process of inferring the status of each other. This is where the simplification of OpenSignals as opposed to the unnecessary complexity of distributed tracing shines a light on what is significant at the heart of service communication and control. The detail of deep traces does not make the job better, it just adds to a growing data fog. For OpenSignals the focus is on the immediate connections and the sensitivity (to errors) that can be exhibited or defined in the course of execution. Many of the signals included in OpenSignals reflect the engineering of resilience.