Resilience needs to be present in every software component. The premise is that errors might happen, and the application needs to make deliberate decisions on handling potential failures. Keep this in mind for local calls, parameter passing, managed services, system calls, etc.
Key points about resilience::
- Inspiration from Design by Contract ideas.
Every function, method, endpoint, or listener should favor checking preconditions for execution, post-conditions, and, if possible, invariants. - Local calls can fail, and the code must be prepared for it.
Global, contextual, or local handlers. The idea is that everything is done deliberately.
- Remote calls are susceptible to failures and inconveniences.
Each remote call has multiple error points: the connection may fail, the response time may be high, and the return may be unexpected. Therefore, the system needs to be prepared to handle this. - Define well-designed solutions to deal with unexpected events in remote calls.
It is necessary to have clarity on how to handle failures: define timeouts, retry policies (e.g., exponential and jitter), circuit breakers, etc. Consider creating idempotent APIs.