So what was the answer, part IV
The topic of conversation was – data guard and/or remotely mirrored disk. “We need to provide for continuity of operations and want to understand the options”.
What I find unnerving sometimes in these conversations is the theoretical desire some people have to operate a heterogeneous disaster recovery (DR) site. That is, one group really really wanted me to tell them how to use data guard with different operating systems (the answer is: you do not and no matter how hard you pressure me, I will not say otherwise). I had a question about using data guard between 9i and 10g (the answer is: you do not). DR is supposed to be something that is relatively bullet proof – easy to have happen when you need it to.
Trying to do DR for your 9i database to 10g (or vice versa) would be less than useful in my opinion. When the day comes for you to fail-over, you really want things to go smoothly. The fact that you are failing over indicates you are already having a really bad day. The data center has burnt down, exploded, flooded, whatever. Maybe people are injured or worse. Maybe the lead person who knows everything about everything isn’t around to wave their magic hands and fix things. You just want it to work. Period. You don’t want to be faced with not only activating the standby database – but upgrading the database your application runs on (or downgrading) at the same time. When I state it that way “you really want to activate the standby and upgrade your database at the same time?” (usually I throw in – when is the last time you upgraded anything with zero errors the first time) – they usually get it.
The same is true for cross operating system – you need the standby/fail-over site to basically be the same as production. Maybe standby is not as large (fewer CPUs, less expensive disk setup, whatever) but it is “the same”. Running production on Solaris and trying to have a Windows machine as a fail-over is a recipe for disaster itself (or to be fair, the converse is true as well).
The problem I think is most people have never actually had to fail-over (that is a good thing I suppose). It is something they’ve heard has happened to a friend – but they never have experienced it themselves. This leads me to the next point people seem to forget with DR sites.
Probably the best way to ensure your DR plan won’t work is to not test it. Data guard is pretty good in allowing you to test – it allows you to do graceful switchover and switchbacks so you can verify that if you actually ever need to run on your standby site – YOU CAN.
Oh, and you need to do this on a recurring basis. Because, as we all know, software has a shelf life, it goes stale over time. Just because you tested the failover (via a switchover) 4 months ago doesn’t mean it’ll work today. Things change. This is one of the things you do want to do on a scheduled basis, after major changes (like an application upgrade – you need to make sure the standby has the upgrade application and can function as well!).
Some of the people I was talking to had questions about data guard versus remote disk mirroring. I myself would prefer to use database methods to protect database data. The problem with remote disk mirroring and databases is that databases tend to write a ton of stuff. In reality however, all we need is the redo to be mirrored. Consider an insert into a table with three indexes on it, using an 8k blocksize. Oracle will modify 4 blocks at least (one table, 3 index), generate at least one block of UNDO, write to at least 2 redo groups, and eventually archive that redo. Remote disk mirroring will be forced to perform that work over the network (it just sees 8k block writes all over the place). Data guard however will just transmit the redo stream. The reduction in data transferred over the network can be huge when you compare data guard to remote disk mirroring. Not only that, but the DR site using data guard can be used for some things – like a reporting database, as the database to be backed up (offloading that from production) and so on.
Does that mean “no remote disk mirroring needed?” No, not really – you still have software, configuration files, setup information – other data that needs to be over at the standby, but that isn’t in the database. Remote disk mirroring and a standby database are complimentary, it generally takes a bit of both to get it done.
This’ll be the topic of conversation (well, one of them) for me tomorrow in fact. I’ll be speaking at this conference in Orlando for 2 hours about Availability, Manageability, and Security.