So what was the answer, Part II
Main Entry: con·sol·i·da·tion
1 : the act or process of consolidating : the state of being consolidated
2 : the process of uniting : the quality or state of being united; specifically : the unification of two or more corporations by dissolution of existing ones and creation of a single new corporation
3 : pathological alteration of lung tissue from an aerated condition to one of solid consistency
I will definitely not be discussing the 3rd definition, that does not sound very good at all. However, the 2nd definition “the process of uniting” – that sounds good.
And to me – it is in general a good thing. Many of the people I talked with had a similar problem. Lots of distributed sites, running basically the same application (well, in fact the same application) and a perceived need to share/replicate data. They wanted to discuss data sharing techniques.
I did not. I’m not a big fan of replication – especially bi-directional, update anywhere replication (which is what they all thought they were interested in). The complexity this adds in terms of design, development, testing, and administration/maintenance is huge. I don’t care whose replication software/process/method/magic you use, it is complex. Most all of them thought they wanted all data centrally aggregated – but each remote site would have the ability to work autonomously – queuing changes when doing so and synchronizing later. If they were not working autonomously, then updates would either happen to the central site and propagate out – or use a distributed two phase commit transaction updating both locations.
I don’t like either of those approaches really. The update anywhere – in an application of any appreciable size (and these were non-trivial, in place/legacy applications) – concept involves a rather complex design (or redesign). It is not something you can just “turn on” and expect to work. An application has to be designed to replicate in this fashion – and if it must replicate over 3 or more databases – the design becomes even more complicated. Two is hard enough, 3 or more is harder. The problem is, many people try to approach this without the design/redesign phase. This is doomed to failure. Update conflicts will happen (same data gets modified in two places). The developers of the application had to of thought of this fact and had to of designed “what happens then”. The problem is – many developers don’t understand “lost update” issues in a single database let alone “update conflicts” in a distributed, replicated, update anywhere database.
Take a simple inventory application. There is a tire, a single tire in stock right now. I, having access to site 1, order that tire. You, having access to site 2, do the same. Eventually, our updates cross each other on the network. Now, the inventory has “negative one” tires in it. What happens here. The update conflict detection was easy enough – the database does that for us. The conflict resolution was trivial (just keep taking things out of inventory). However, we have just violated some sort of business rule. In a single system, one of us would have been told “Sorry, you lose”. In the distributed system – we both think we won. How do you pick who loses? How do you notify them? What is the maximum time someone might be deluded into thinking they have the tire on order? What else can go wrong (remember there are literally hundreds of tables in the application – maybe hundreds or thousands of transactions – any of which could “go wrong”) in this application.
Now, some 15 years ago (early to mid 1990’s), we may have considered this a necessary evil. Networking was a shaky proposition. Wide Area Networking was really shaky and unreliable. But today, in 2006?
I was at the West Wall in Jerusalem that week I was talking with all of these people about their four questions. As far as I was concerned, I was about as far away as I’ve ever been from “my systems” – the computer systems I use, I rely on (email is one of Oracle’s mission critical applications). At the West Wall, I was in line for a tour – nothing to do for a couple of minutes. So what did I do to amuse myself? I checked my email naturally, I could have browsed the web, instant messaged with someone, whatever. I remember when I would go to Europe from the US in 1995 – it was like going to another planet connectivity wise. No mobile phone. I couldn’t even use the phone jacks to dial into a network there without a converter – and even then, I had no phone numbers to dial. I was effectively off of the network unless I was in an Oracle office. Now – it seems that no matter where in the world I am, I have access to a network and to “my systems”.
Whether it be my phone with GPRS/Edge, my Aircard with EVDO or 1xRTT, a line of sight wireless network, a hotspot which seem to be popping up everywhere, a wired network, a satellite connection – whatever. In Tel Aviv – my wireless connection for the 24 hour period expired as I was writing an email in the hotel. I was leaving in a couple of hours and didn’t want to pay again. No big deal, I simply failed over to my phone (plug in the hot sync cable, fire up pdanet, I’m on).
My thought on replication therefore, don’t spend the money in the design, development, test, maintenance, and administration (which will be quite huge, but only if you want the application to actually work) – but rather invest in a redundant networking infrastructure – a failover solution. That is something that will be useful for everything. Not just this one little application – everything.
In some cases however, the problem wasn’t necessarily technical – it was political. People don’t like to give up “their” data (and here I thought the data belonged to the company…). This would mean centralization, coordination, perceived loss of control. Then all I can do is spell out exactly what it entails to build a distributed, replicated application. It isn’t easy.
Our mission critical application in Oracle of email is a single centralized system (with failover of course – RAC in a room to keep the main server going, data guard to a remote site to make it so that a catastrophe doesn’t wipe it out). It used to not be that way. It was many little distributed systems all over the world. We had the same arguments internally – network is the problem, loss of ‘control’ is the problem, you cannot take ‘our’ data away from us was the problem. Funny thing years later – none of these are a problem anymore. It runs, it runs well, it runs with lots less overhead than before. It is easier.
Replication technologies, the unidirectional read only type, has a place perhaps – in warehousing. But to build an application with – not unless there is a really compelling technical reason (a submarine has a really compelling technical reason for example, there data sharing technologies and update anywhere just might be appropriate).
Anytime I’m asked about synchronous replication (this table must be the same in both databases all of the time), my answer is “you’ve done this really wrong”. Even if asynchronous replication is permitted – but the data is modifiable in more than one place, I would answer basically the same. I know how the replication technology works, I’ve used it, I can describe to you what it does – but I personally don’t like to promote it for most applications. It is the path of last resort – not my first choice.
So, back to consolidation – I believe “like systems” should be consolidated. If you were going to replicate between two or more systems – you probably really meant to build a single system. Distributed complexity is just that – complex.
I believe that the maximum number of instances on a single server is one. (in my world that is also the minimum, but that is another story…). If you are running 10 instances with 10 applications – you really meant to run a single instance with 10 applications inside of it. It is the only way you’ll really be able to tune, to control, to manage, to keep them all on the same release (sort of forces the issue). If you can run 10 instances on that server with 10 applications – you could really run 11 or 12 or more applications on that same server with a single instance. You don’t have multiple SGA’s with their redundancies and “oversizing”, you don’t have multiple pmons, smons, lgwrs, dbwrs, and so on (and the contention caused by having multiple lgwrs, multiple archs, all thinking they are operating in isolation). You don’t have one of the instances consuming all of the CPU at the expense of the others (you can use profiles and resource manager in a single instance to control resource utilization).
So, that is my take on consolidation. It does not mean “you will run a single database for everything”. It means you will run as few databases as you can – one instance per machine, and try to avoid distributed complexity. Data sharing has it’s place – in warehousing, but update anywhere replication is hard. It complicates the design (or at least it should, but many times does not, leading to applications that don’t work correctly in the field).