Lessons from a DR activity that didn’t go as planned
What is a Disaster Recovery (DR) activity?
A DR activity is a planned exercise, where you intentionally flip your workloads to a secondary setup. It may be a new region, a new cluster, or a new infrastructure environment.
The goal is to ensure that everything works the way it should when your primary environment is unavailable.
Lessons from DR activity
This week, we ran a DR activity. Everything looked perfect on paper.
We had our scripts ready.
We had our configuration files ready.
Our binaries, software, databases — all ready.
And yet, when it was time to flip the switch… things didn’t go the way we thought they would.
It was a reminder that in DevOps, systems don’t behave the way you expect.
Here are the three big lessons this DR taught me.
1. Something Unexpected Will Break
We genuinely believed everything was in place for a smooth DR switch. But once we started validating the environment, we hit issues immediately.
- First, an unexpected S3 access failure.
- Then, an SSL configuration mismatch.
Systems don’t fail where you expect them to. They fail at places you never thought about.
That’s why DR tests matter.
It exposes what you didn’t know could go wrong.
2. Debugging Skills Matter More Than You Think
When something breaks during DR, you don’t have the luxury of time.
And this is where mastering basic debugging tools becomes a superpower.
digfor DNScurlfor connectivitypingfor latency testtelnetfor TCP reachabilitynetstatfor connection list- logs, traceroutes, health endpoints
These commands look simple, but they become your compass to find out why something happened.
3. Know Your Application Inside Out
One of the biggest advantages during DR is knowing your application topology:
- Which service runs on which port?
- What’s the context path?
- What external dependencies does it have?
- How does the application flow look like?
- Where does it log?
- How does it connect to storage?
- Which microservice is reposible for a feature?
When you know these details, debugging will be less troublesome.
Knowing the architecture, application flow isn’t optional in DevOps.
So… Are Systems Predictable?
No. And this week proved it again.
Systems aren’t predictable.
Systems aren’t predictable, but their patterns are. DevOps is about making those patterns visible before they become incidents.
DR drills like these open your eyes.
And that’s the real goal.
Final Thoughts
Preparation, not certainty, is what keeps systems reliable.
And this week, that lesson was loud and clear.
