On Large-Scale Architecture and Communication
There is a fairly commonly used diagram that plots software architectures against two axes: distribution and cohesion. In the corners, you have:
Monoliths – highly cohesive, non-distributed systems.
Distributed monoliths – highly cohesive, distributed systems.
Moduliths – highly modular, non-distributed systems.
Microservices – highly modular, distributed systems.
In my experience, the distribution axis is a red herring and is completely irrelevant. By the same token, the cohesion axis is not a real axis. There is only an utterly wrong (cohesive) and an obviously right (modular) choice. Monoliths only work when they are so small in scope that in a larger system, they would be reasonably considered a single microservice. Distributed monoliths do not work. Stop creating distributed monoliths.
Having covered that, we will never mention monoliths again. Our basic assumption for a successful architecture going forward is high modularity. Whether you deploy it as a single container (modulith) or multiple (microservices) is only relevant to your DevOps guys, not to you. For that reason, in the following text and other posts, when a module or a microservice is mentioned, you can freely substitute it for the other word. It doesn’t matter. Our key concern is that we have modules and, more importantly, that those modules are highly expendable and replaceable as the requirements for the systems change. We do not want to edit our code; that is difficult. We want to either add new modules or delete old ones wholesale; doing that is easy.
Individual modules must communicate to cooperate and produce the results that our business needs. For the purposes of this discussion, we will make a distinction between internal modules that we control and external modules (or external systems, if you prefer) that some third party controls and we are, most likely, integrating with. We aim to keep our internal modules as independent from each other as possible. With the external modules, we almost never have this choice. We will work under the assumption that when externals change, the internals interfacing with them will also have to change or, as mentioned before, be completely replaced.
The basic unit of communication is a message. A message in which the sender anticipates a response from the receiver is, for our purposes, a synchronous message. On the other hand, if the sender sends a message with no assumptions about the receiver (or if there even is one), we will consider it an asynchronous message.
Every module has an internal state that can be very simple or very complex but is invariably mutable. A module without an internal state is almost always a proxy for other modules that do. This is a smell; consider removing such modules and incorporating their logic into the modules they are proxying.
There are only three valid types of messages. You may find more in the literature; I will briefly cover why they are wrong.
A query is by far the simplest type of message because it is very restricted—a query may not, under any circumstances, change the state of the module except for trivial cross-cutting things like logging. It is synchronous – the whole point of a query is for the responder to provide some information about its internal state. A query introduces some minor cohesion between the modules; typically, this cohesion is forced by business logic and, therefore, an essential part of the problem our system is solving.
A command is very similar to a query but much more complicated because it may cause a change of state in the receiver. It is also synchronous because the sender is specifically instructing the receiver to do something and, therefore, expects a specific result. A command introduces major cohesion between the modules, as the business logic of one is strongly coupled to the business logic of another. We will aim to never have any commands between our internal components. We will refactor any such occurrences with topmost priority. We are, however, forced to consider commands a necessary evil when communicating between internal and external components.
Some literature is very strict about separating commands and queries, stating that commands should never return any data. I personally find this restriction unnecessary. It is perfectly acceptable for a command to also include some kind of response describing the change in the state. The sequence of issuing a command and then querying the changed state anyway is a very common and unneeded antipattern.
An event is quite distinct from the other two messages because it is, by definition, asynchronous. Events are almost always tied to state changes in modules; in other words, they are a way for a module to announce to its environment that something has changed. This may be caused by a command, or it may happen spontaneously (consider, for example, scheduled tasks being triggered by the clock service). However, this announcement cannot make absolutely any assumptions that any other module will react to it, much less in any specific way. For that reason, events introduce no cohesion in the system. Just like commands, the receiver of the event, if it chooses to, may change its state to react to the event, which in turn may cause more events to be emitted – now by the receiver. Event avalanches are the only real risk of the architectures we are considering here; careful and thoughtful analysis is needed to ensure that the local behaviours of modules make sense and produce an eventually consistent result.
A common problem with the events is that the external modules simply refuse to handle them. This has nothing to do with technology and everything to do with humans hating integrating with other people’s systems. It is literally the only reason for commands to exist beyond user inputs in their terminals.
There is no such thing as an asynchronous query. Many people have tried to simulate them using events, but they essentially end up polling for the results and reintroducing synchronization anyway. Stop lying to yourself; stop trying to make asynchronous queries happen.
There is no such thing as an asynchronous command for all the same reasons, but also because you should be using event semantics for whatever you are trying to do anyway. This allows other receivers to also react to the event in their own way.
There is no such thing as a synchronous event. Any attempt to synchronize between the sender and the receiver, any attempt to restrict the event to be sent to a specific receiver (instead of receivers choosing what they handle) breaks the contract and turns it into a command. Commands are bad. Do not introduce commands into internal communications.
Here is a recap of what was discussed here.
A module is a stateful black box that may receive and send messages.
A query is a request for data. It may not cause a change of state in the module. It may cause the module to send queries to other modules (query aggregation). It may not cause commands or events to be sent.
A command is a request for a module to change its state. It may cause the module to send queries, issue commands (to external modules only), and emit events if the command actually results in a state change. We will aim to never use commands internally and only expose commands to external modules.
An event is an announcement by the sender that its state has changed either because of a command or spontaneously. Other modules may choose to react to the event or ignore it. If a module chooses to handle the event, it may send queries, issue commands (to external modules only) and emit more events if the handling of the event causes the state of the receiver to change as well.
Two final notes.
If you replace “module” with “object,” this article will become the basic definition of object-oriented programming. This is intentional and reflective of what the concept was when Alan Kay introduced it. It was never about inheritance hierarchies; it was always about small interchangeable objects with defined APIs working together. Keep that in mind and read some old books about OOP/OOSE.
While the text focuses on modules being black boxes with states, this simplification is actually somewhat misleading. It is much more productive in the long run to disregard the state completely and define the queries and commands a module accepts and the events that it emits as the module. No matter the language, no matter the framework, no matter the underlying hardware, operating system, database, or any other thing, if your particular piece of software is capable of doing these things, it is that module.