Components of High Availability
The first and probably the most important is VPXA. This is not an HA agent, but it is the vCenter agent and it allows your vCenter Server to interact with your ESX host. It is also takes care of stopping and starting virtual machines if and when needed.
HA is loosely coupled with vCenter Server. Although HA is configured by vCenter Server, it does not need vCenter to manage an HA failover. It is comforting to know that in case of a host failure containing the virtualized vCenter server, HA takes care of the failure and restarts the vCenter server on another host, including all other configured virtual machines from that failed host.
When a virtual vCenter is used we do however recommend setting the correct restart priorities within HA to avoid any dependency problems.
It’s highly recommended to register ESX hosts with their FQDN in vCenter. VMware vCenter supplies the name resolution information that HA needs to function. HA stores this locally in a file called “FT_HOSTS”. In other words, from an HA perspective there is no need to create local host files and it is our recommendation to avoid using local host files. They are too static and will make troubleshooting more difficult.
To stress my point even more as of vSphere 4.0 Update 1 host files (i.e. /etc/hosts) are corrected automatically by HA. In other words if you have made a typo or for example forgot to add the short name HA will correct the host file to make sure nothing interferes with HA.
Basic design principle:
Avoid using static host files as it leads to inconsistency, which makes troubleshooting difficult.
Next on the list is VMAP. Where vpxa is the process for vCenter to communicate with the host VMAP is the translator for the HA agent (AAM) and vpxa. When vpxa wants to communicate with the AAM agent VMAP will translate this into understandable instructions for the AAM agent. A good example of what VMAP would translate is the state of a virtual machine: is it powered on or powered off? Pre-vSphere 4.0 VMAP was a separate process instead of a plugin linked into vpxa. VMAP is loaded into vpxa at runtime when a host is added to an HA cluster.
The vpxa communicates with VMAP and VMAP communicates with AAM. When AAM has received it and flushed the info it well tell VMAP and VMAP on its turn will acknowledge to vpxa that info has been processed. The VMAP plug-in acts as a proxy for communication to AAM.
One thing you are probably wondering is why do we need VMAP in the first place? Wouldn’t this be something vpxa or AAM should be able to do? The answer is yes, either vpxa or AAM should be able to carry this functionality. However, when HA was first introduced it was architecturally more prudent to create a separate process for dealing with this which has now been turned into a plugin.
That brings us to our next and final component, the AAM agent. The AAM agent is the core of HA and actually stands for “Automated Availability Manager”. As stated above, AAM was originally developed by Legato. It is responsible for many tasks such as communicating host resource information, virtual machine states and HA properties to other hosts in the cluster. AAM stores all this info in a database and ensures consistency by replicating this database amongst all primary nodes. (Primary nodes are discussed in more detail in chapter 4.) It is often mentioned that HA uses an In-Memory database only, this is not the case! The data is stored in a database on local storage or in FLASH memory on diskless ESXi hosts.
One of the other tasks AAM is responsible for is the mechanism with which HA detects isolations/failures: heartbeats.
All this makes the AAM agent one of the most important processes on an ESX host, when HA is enabled of course, but we are assuming for now it is. The engineers recognized the importance and added an extra level of resiliency to HA. The agent is multi-process and each process acts as a watchdog for the other. If one of the processes dies the watchdog functionality will pick up on this and restart the process to ensure HA functionality remains without anyone ever noticing it failed. It is also resilient to network interruptions and component failures. Inter-host communication automatically uses another communication path (if the host is configured with redundant management networks) in the case of a network failure. The underlying message framework exactly-once guarantees message delivery.