Resilient Software – Resilient Software and PL/SQL
Resilient Software
The dictionary definition of resilience tends to reflect the ability of a software system to operate under stress or to absorb the impact of a problem. Maintaining stability and failing gracefully are other attributes of resilience where an acceptable service level continues to be offered to the business even when problems occur.
It should also be possible to know the state of the resilient system, e.g., what is the state—consistent or otherwise—if a long batch job fails halfway through? Having reliable information about the consistency then allows us to answer questions such as whether we should restart the batch job from the beginning or whether we should pick it up where we left off. An important part of resilience is knowing the state of play, i.e., knowing the level of progress of jobs and workflows. Additionally, if failures occur, you can find out how to get back up and running, or better yet, the software automatically restarts itself.
In a nutshell, resilient software is composed of rock-solid code that runs forever (i.e., decades). Such code is very difficult to write because it has to cope with all sorts of business requirement changes, unforeseen error conditions, and potentially unexpected input data. Add to this the need for ongoing updates in the form of security fixes and new features. Finally, don’t forget that the runtime platform itself may come under strain as the workflows evolve. It takes a lot of effort to produce resilient code.
Another aspect of resilience is the need for new and legacy code to coexist in harmony. This is one of the great challenges of integration: it is essential that new code does not undermine systems and workflows that have run smoothly for many years.
Examples of Resilient Systems
Operating systems such as Linux, Windows, macOS, and so on are examples of systems that must be resilient. Much research and development effort in recent years has gone into making these operating systems more resilient than they used to be. The following are recent resilience improvements in operating systems:
- Carefully managing the transition from operating system user mode into kernel mode and vice versa
- Memory protection
- Process isolation
- Adding programming abstractions that facilitate these improvements
As you’ll see later, resilient code goes to great lengths to protect both itself and the runtime environment. This is typically achieved by making judicious use of language abstractions and other constructs.