I called out several issues with the standard or typical example of the OODA loop in my previous post. One particular point was the lack of detail surrounding the Decide phase of the model. While OODA does explore some of the factors involved in the Orient phase that feeds into the Decide phase, it offers minimal elaboration on what is indeed relayed and how it might be reasoned about. Here is where we can augment the OODA loop with another model, one of decision making, referred to as the Recognition-Primed Decision (RPD) model of rapid decision making. The model asserts that individuals assess the situation and generate a plausible course of action (CoA), which is then evaluated using mental simulation. In the original paper, the authors claim that decision-making is primed by how the situation is recognized and not entirely determined by recognition. The model runs counter to the common thinking that individuals employ an analytical model in complex and time-critical operational contexts, in which multiple options are carefully evaluated, weighed up, and compared before choosing a response or action. The analytical model works best with inexperienced individuals, whereas experts employ more of a naturalistic decision-making method that is heuristic, holistic, or intuitive.
In the RPD model, an expert understanding of a situation depends mainly on the goals, cues, expectations, and the typical actions within such situations (prototypical patterns). In the above diagram, I’ve replaced cues with signals and actions with scripts (courses of action). There are three components to the RPD model, the matching component, the diagnose component, and the simulation component. The matching component attempts to identify the current situation from the memory of prototypical situations. If the situation is not recognized, further diagnostics are obtained, and (online) learning is engaged. Here pattern recognition is central to the decision making. A pattern consists of cues, spatial and temporal relationships, cause-and-effect chains, and reflects the operational goals and expectations. The intelligence, or expertise, we commonly hear in mission-critical situations is the extensive knowledge of patterns that make it extremely easy to identify the small but critical states that a system is within or is about to enter. Once a decision, or choice of action script, is made, then a mental simulation of the anticipated consequence is done, and the expected outcome is compared with the goals. Here an effective and efficient mental model of the situation is paramount for a fast and accurate assessment. If the outcome is favorable, the course of action is taken; otherwise, alternative scripts are evaluated. If none of the mapped scripts are found to be acceptable, then further diagnosis is initiated.
Most of us can readily recognize the essential aspects of this model in everyday life. Still, it is hard to point out where our current approach to Observability tooling supports complex distributed computing systems. The site reliability engineering (SRE) community’s current emphasis is on data collection, which is far too quickly and irresponsibly relabeled as information or worse knowledge. Acquiring information on a system equates to projecting the system’s future states with much less of a degree of uncertainty. How does a distributed trace, log, or event even come close to addressing such prediction capabilities and capacities? The lens (or model) from which we view the computing world has many an engineer staring down at data and details, seeing trees, more so roots, while being utterly oblivious to the forest, ecosystem, and the nature at play (the dynamics of action). We’re failing at first base.