Simplicity and Significance in Observability

Over the last few years, complexity has been on the rise within the computing infrastructure, especially with the movement to a finer granularity in deployment units. We’ve seen some companies adopt microservices so enthusiastically that what was once largely considered a monolith is now broken up into hundreds, even thousands, of pieces of execution units that are still by and large connected.

One might naively expect that as complexity increases in the world of computing that the tools and approaches employed should also do likewise. This, I believe, was, and still is, a grave misconception. I would argue that the opposite should have happened. As computing, and complexity, was scaling up, the models and methods should have reduced and simplified the communication and control surface area between man and the machines. Instead, monitoring (passive) and management (reactive) solutions have lazily reflected the nature of the complexity at a level that is devoid of simplicity and significance but instead polluted with noise. Engineering teams today are far too busy wrestling with and wandering around an ever-expanding data fog of metrics, logs, and distributed tracing. Fearful of the complexity, many engineering teams worry over their ability to collect, store, and analyze more and more data and details – but never to question and reflect on the effectiveness of such. One could very well argue that complexity has been replaced with complicated. We are not understanding or solving the problem of complexity, we are just attending to and acting on another problem because it feels much more familiar than the changing world of today. There is seeing but no perceiving. There is doing but no direction. There is collection but no cognition.

Application monitoring and management solutions are far more complicated than they need to be. That single plane of glass that many vendors talk up consists of literally hundreds of layers, tabs, views, charts, and navigation aids. It has become such a sorry tale that some vendors have created an onboarding experience that consists of a game leading users through a path to some golden nugget of information. The problem here is that the data is so detached from the service domain and systems dynamics unless one were a machine this is a temporary bandaid for a far more troubling problem where data is valued over information and useful models.

With OpenSignals the aim is to bring simplicity and significance back into the world of monitoring, observability, controllability, and management. The basic idea is pretty simple as are many useful innovations: see, perceive, model, and reason about the computing world of microservices much like how humans do so within societies and cultures consisting of multiple agents of offered services.

At the heart of all human (and animal) communication and cooperation we find signals and inferred states. Signals are emitted or received. Signals are indicative of operations or outcomes – signs and traces of the past and the sliver of the present. Signals are used to influence others and overtime to infer the state of others as well as ourselves on reflection. A signal is a direct and meaningful unit of information within a (social) context, much like an emoji. It is not a message that needs to be introspected in part and then interpreted. In any interaction, humans are emitting and receiving signals via body language and vocalization, as well as what is physically passed and contextually communicated. This signal processing and transmission are paramount to effective cooperation and coordination. But the signals are just a means to an end, and that end is the assessment of ourselves and others – state inference.

When it comes to monitoring environments, the focus and frame of reference should always be about the status of operation of a service from the perspective of each other service that interacts with that service. An assessment of service quality should not be based on what a service itself tells us by way of published metrics – this misses the point that no service exists in isolation anymore within a network of high interconnectivity. Instead, an assessment should reflect how other services perceive a service by way of signals and the inference to a state, that can be different depending on the sensitivity to signals that each service might have. Sensitivity manifests in the different weighting of signals and the rate of decay of past memories each service is configured for.

OpenSignals brings simplicity and sensibility by way of a focus on what is effective and of significance to the vast majority of service management attention – what is the status of this service, this cluster (of services), or this system (of services).

The conceptual model and language (terms) are small, and the sequence of processing straightforward. A service creates a context that is a representation of the world. Within this context, the service itself is represented alongside the other services it interacts with. In the course of interaction, the service owning the context, acting like a mind or model, records signals against the representations of itself and the other services. The signals that are recorded are then giving a scoring, based on the configuration used to create the context, and then mapped to a status bucket for each of the possible status values per service represented. The scoring card will tally each bucket and make a generalized assessment of service with some decaying mechanism in play, much like a human memory system works. The context can then, via a plugin functionality, transmit the status changes within the context to other interested observers where collective intelligence can manifest in additional aggregation, ranking, weighting, etc.