When TSIM is in HA setup, how does it determine when to failover?
Which of the processes are monitored and how long should it take to failover?
TSIM HA monitors the following processes through a health check:
Request for Enhancement DRTSO-41993 is open to look to add other processes, such as httpd, to the health check.The health check scans for the above processes as per the frequency set for the attribute:
(default is 60 seconds)Which is in the file:
If any of these processes are down, it repeats the scan for the number times as set by the below attribute in the same pronet.conf file:
pronet.ha.availability.max.retry.countThis gives the process the chance to recover and not trigger a failover too frequently.
(default is 4 before 10.7FP3 and 6 after 10.7FP3)
If the process is still down after pronet.ha.availability.scan.frequency.in.secs times pronet.ha.availability.max.retry.count (up to 6 minutes when using the defaults) then failover will occur.
However, when ALL of the above processes are found to be down by the health check then failover occurs immediately regardless.
For large TSIM environments we do not recommend to reduce these default values as there can sometimes be a delay when doing the process health check and this delay should not trigger failover.
These default values should also not be increased without BMC analysing the logs and checking the processes behaviour which triggered undue failover.
It is important to note that TSIM HA has the notion of a primary TSIM and a secondary TSIM. When the primary TSIM is up and running it will become the active node of the HA even if the secondary TSIM has been running.