Exchange 2000 Cluster not so scalable (updated sp2 info)

update: exchange sp3 does not add more users per active nodes!

Update: Microsoft posted cluster whitepaper with new limitations. Link is at the bottom of page.

A design failure in Exchange causes the Exchange Enterprise Edition not to be so scalable as intended. Microsoft is going to advise not to have more then 1000 users connected to a node in an active/active cluster.

This was first mentioned in Win2k magazine's exchange newsletter by Jerry Cochran (Compaq Exchange 2000 expert and Win2k magazine author).
When I read that, I was just finished designing a 3000 users per node active/active Compaq Exchange cluster based on MS and Compaq docs, for a project I work for. I never saw anything on this limitation in the initial Docs. I tried to contact Jerry Cochran, but he didn't reply. So I posted questions in forums and someone contacted me and said it was true, he was in MS "non disclosure agreement" so couldn't say to much. It seems that there were concerns in fail-over when using more than 1000 users a node. He also said when SP1 is released the Docs will be updated with this new limit advisory. I was disappointed in the Scalability and "Enterprise"-ness of Exchange 2000 and was still hoping in some way it was a misunderstanding. I contacted Microsoft Engineers in Holland, they didn't know anything about this, but when they contacted US engineers I got the truth on the for Microsoft sensitive issue.

The Problem:

The problem arises on Exchange 2000 cluster. (Exchange 2000 Enterprise Edition) The cluster-nodes run Exchange in a Virtual Machine (VM)
When MAPI clients (Outlook) connect to Exchange 2000 cluster box it fragments the memory in the VM, when a lot (more than 1000) MAPI clients do this, all memory may be claimed. So in an active/active cluster, when a node fails over to the other active node, the second instance (VM) of Exchange (STORE.EXE) may fail to start because there is no memory available.
In other words the issue is based on VM issues on a running cluster with Exchange, where there is a possibility that not enough VM will be available to tolerate a failover on an already active cluster node.

It seems this problem isn't simple to fix, it sits deep in the architectural design of Exchange or rather Store.exe. Therefore won't be fixed by a hotfix or any Service Pack. I guess MS needs to reprogram store.exe from scratch to overcome this.

You can understand this is a sensitive issue for Microsoft. They can't supply a fix so they must take down the specifications of their first AD Cluster killerapp.
That must have been a though decision.

The workaround according to Microsoft:

Don't use more then 1000 (1500 with sp1, 1900 in sp2 !) users per node in an active/active Exchange cluster:

With this limit fragmentation still occurs but with this user limit there is an assurance that the VM will be available for an Exchange Instance to fail over.
This still is a good solution for High Availability sites with no more than 2000 users with good ROI because both machines are used.

Use active/passive cluster(s)

With this limit fragmentation still occurs but when failover occurs the Exchange Instance will start in a fresh VM with fresh clean Memory on the other active node.
So for this method you're not limited by 1000 users but rather limited by the sizing of your hardware.

Manual failover

Manually restart STORE.EXE on the still working node, so memory will be freed up and start unfragmented, then fail over the Exchange from the failing node.

Yeah right, so much for unattended High Availability. This means that the still working node is also interupted!

Use Windows 2000 Datacenter

Use win2k Datacenter 4 node cluster using 3 active nodes en one passive.
Exchange 2000 on Datacenter is only supported with Exchange 2000 SP1.

The Fix:

Like I said NO FIX !!!
So that is why it won't be fixed until Exchange 2003 (codename "Kodiak")is released in 2002/2003. Kodiak will be based on SQL database engine instead of the current ESE.

When SP1 comes out the Exchange Whitepapers will be updated with the new limitations. Let me guess the title:



Source: Microsoft and exchange forums

See compaq's Deploying Exchange 2000 Clusters within Design Limitations

UPDATE: Microsoft SP1 cluster Whitepaper

Update ! MS SP2 cluster Whitepaper

Amsterdam, June 16 2001

Steven Bink

Subscribe to Bink's Newsletter to keep you up to date!

Bink Windows