I was thinking about cluster node counts the other day, and wondering about their distribution. That is, what percentage of clusters are two node, three node, four node, and so on. There are a couple of different kinds of clusters out there -- in this context, these are tightly coupled clusters for availability (failover) or concurrent transaction clusters (like Oracle RAC, Sybase ASE Cluster Edition, or Cluster File Systems) -- not large computational clusters (like BlueGene). I ran the numbers from SORT uploads, and got the distribution. Then I wondered whether Linux might have more nodes than AIX/HP-UX/Solaris. Linux fans might say "Naturally, because it's cheaper" while detractors might think "Of course, because SMP scalability is still better on <my platform/>." One or both groups appear correct:
You can see the white has fewer 2 and 3 nodes, and more nodes at ever larger size (excluding 9 nodes, where the tail thins).
Is the difference statistically significant? It's clearly not a normal distrubution, but I went with a t-test to keep things simple, invoking the Central Limit Theorm under my breath, and the P-value was 0.0005, which is pretty good. If you'd recommend a different test, drop me a line.
I've talked to customers who use single node VCS for application start/stop, and certainly you can start with one node as an install plan; I even met one customer who purchased one node in one quarter, one in the next due to their internal budget cycle. Still, I was a bit surprised to see around 10% of the clusters were single-node.