This knowledge article may contain information that does not apply to version 21.05 or later which runs in a container environment. Please refer to Article Number 000385088 for more information about troubleshooting BMC products in containers. Ranking a Server GroupOverview: A typical server group will have a combination of User-facing servers (1 or more) and back-end servers ( 1 or more). In order to provide the best possible response to end-users, it is preferred to direct all ranked operations to back-end servers and leave the user-facing server for processing user requests. When using back-ends servers, they must belong to the same Server Group and use the same Server Name alias as the user-facing servers.
Depending on the environment, each operation may have a different impact on resources. For example, customers with heavy email engine utilization, such as sending 10s of thousands of notifications or more per day, will have a heavier impact than some other customers who are not heavily utilizing the Email Engine. The same can be said for every operation. Therefore each customer needs to look at their own requirements in order to properly rank each operation.
This document will provide guidance for creating a strategy for ranking all operations across your server group.Types of Ranking:There are two basic types of Ranking.
There are operations that are ranked via: 1- The AR System Server Group Operation Ranking form1 . 2- And those that are ranked via the AR System Service Failover Ranking form2.
As you will see, even though they behave differently, operations of both types need to be considered when forming a strategy.Types of Failover:Just as there are two types of Ranking, there are generally two types of failover, with one exception. 1- Operations that are ranked with the Server Group Operation Ranking form fail over when the AR Server they are running on is no longer responsive or is down. 2- Operations that are ranked using the Service Failover Ranking form fail over when the individual component (or Companion Service) is no longer responsive or is down.
(3)- The one exception is FTS. All servers that are ranked (rank >0) for FTS are always active indexers. But the lowest ranked server that is currently active is the FTS Searcher server for the group unless the associated Plugin Server is unresponsive, the ARservers is down, or the collocated Indexer is in the middle of a Re-Index operation. In those cases, the next highest ranked server will become the Searcher.Distributing Operations:It is possible to have all operations run on a single back-end server. This is not uncommon since it achieves the goal of off-loading work from the user-facing servers. If the server has enough resources, then it can perform all operations.
When more than one server is needed to perform all the operations, there are two general methods for doing this. 1- You can setup one or more additional back-end servers and distribute the operations. 2- Or if you have enough resources on some of the user-facing servers, you can move some of the less utilized operations to those servers.
Keeping the goal of end-user performance in mind, it is preferable to keep user-facing servers out of the ranking whenever possible. Some operations must be combined. Whichever server owns the Administration operation must also own the BMC Atrium CMDB and Business Rules Engine operations5.
Note that Approval Server can perform Admin operations (Create/Modify/Delete Filters) and when it does, it must be running on the Admin Server. The activities that cause Admin operations are performed from the AP:Administration form. These can be scheduled and performed at specific times; these are not regularly occurring activities. So Approval Server can be ranked without regard to where Administration operation is ranked except for those scheduled activities. At that time, Approval Server can be temporarily manually ranked8 to the Administration server and then re-ranked back afterwards. But it is generally recommended, and documented, that Approval Server be paired with the Administration Operation
Other than Combining Admin Operations, there are no hard and fast rules for distributing operations.Operation behaviors:Below is a list of all ‘rankable’ operations and some behavioral considerations that might help decide where to rank each of them.
Operation | Behavior Considerations | Administration | This operation is single threaded so it typically does not consume a large percentage of CPU. Administration work is usually recommended during non-peak hours. Must be combined with CMDB and Business Rules Engine. Users performing Admin or CMDB work must log directly into this server. The main downside of this operation failing is that Admin work cannot be performed. This is not usually as important as End-user activities; so this operation is often not ranked onto user-facing servers | Approval Server | The range of resource utilization varies greatly among customers. Typically, this operation uses 4 or fewer threads but they can be quite active in some environments. Approval Server runs as a Java Plugin which can be offloaded into a standalone plugin server. Approval processing is usually considered a high priority so this operation is often ranked across multiple servers. Except for special cases (see note above) Approval Server should be ranked with the Administration Operation | Archive | Archive operations are low-resource and usually not critical but they are expected to be continually running in order to keep up with the queue. There is relatively low risk if this operation is not running for a short period of time. Note: Archive is a single threaded operation up to v.18.08. From version 19.02 onward, Archive is a multi threaded operation. | Assignment Engine | This is a standalone Java process that varies greatly in resource utilization from customer to customer. It is a high-value operation since work does not get performed if it cannot be assigned. For that reason, the operation is usually ranked across multiple servers. Depending on the environment, it can easily cohabitate with other operations. | Atrium Integrator/ Atrium Integration Engine | This is an onboarding/import utility that can utilize multiple threads and varies greatly in use depending on the environment. Some customers are continually loading and updating data and require that the data synchronization be near real-time. For that reason, some customers ensure that this operation has plenty of resources and is ranked to a second back-end server with enough resources. Other customers use this very little. Note: Atrium Integration Engine is an older, deprecated version of Atrium Integrator. For current versions Atrium Integration Engine should not be ranked | Business Rules Engine | This operation must be combined with the Administration server. It is single threaded and does not utilize a high amount of resources. | CMDB | This operation must be combined with the Administration server. Depending on the customer environment, this can be a high-volume operation. Some CMDB jobs are very complex and use ample CPU and memory. | DSO | The Distributed Server Option is an option that is not used by every customer. It is used by Hub and Spoke environments and for customers that have customized their environment to use this feature. The amount of resources that it uses varies greatly. It can be configured with a single thread performing very little work or it can be configured with many threads, all moving large amounts of data to many different servers at different locations. For that reason, one must understand how they use DSO in order to properly rank it. | E-Mail Engine7 | E-Mail engine should NOT be ranked from the AR System Server Group Operation Ranking form. It uses the Service Failover feature instead. E-mail engine is a unique operation because it can be broken up into multiple pieces that are each ranked differently across multiple servers. Individual Mailboxes can be ranked. So, one might rank all incoming mailboxes as #1 on one server and all outgoing mailboxes as #1 on a 2nd server while ranking them 2nd and 1st on the other server. This allows distribution of work while maintaining failover. There are many ways this can be done since each mailbox can be ranked individually on multiple servers. This is a highly customizable operation and can be configured very particularly to get both the best performance and availability. Alternatively, this operation can be treated very simply by ranking all mailboxes together. | Escalation | Escalations are typically a low to medium resource operation. Escalations can use very few thread performing very linear tasks. Or they can be configured to use many threads to perform multiple parallel tasks. Escalations enforce many business processes and send notifications so are considered to be a high-priority operation. For this reason, it is often ranked across multiple servers | Flashboards | Flashboards is a standalone Java process that uses a single thread and relatively low resources to collect historical data for use with Flashboards charts and graphs. This is typically not a high-risk operation depending on the environment. | Full Text Index | The Full Text Index operation sets up an FTS indexer. The indexer is a part of the ARServer JVM and is enabled if the server is ranked. Multiple FTS indexers can be used simply by ranking more than one. Each indexer will have its own local copy of the Full Text Search database (a flatfile db). Any work that performs an FTS or MFS (Global Search) query is directed first to the #1 ranked server. If that server is unavailable or busy performing a reindex, those queries are redirected to the next FTS server in line. FTS indexing can use a lot of resources depending on how it is used and how much FTIndexed data is changing. SmartIT and MyIT have a tendency to heavily use FTS. If your environment uses those application interfaces, you might consider putting FTS on separate servers to get better performance. | Normalization Engine | Normalization Engine is a Service Failover operation that performs scheduled Normalization jobs. It can be ranked across multiple servers in order to ensure that all job get performed. Normalization Engine is a Java Plugin that can be configured to use few or many threads. Depending on configuration, this operation can consume a high percentage of CPU and memory. | Reconciliation Engine | Reconciliation Engine, like Normalization Engine, is a Java Plugin that can be configured to use few or many threads. Depending on configuration, this operation can consume a high percentage of CPU and memory. Some customers will combine these two operations by themselves on a server that has ample resources to perform them with good performance. They are configured to use a high percentage of the resources, but they are not competing with other operations. | Service Failover | The Service Failover operation manages the Service Failover feature by monitoring the health of the ranked operations and redistributing them as necessary. For any Service Failover operation to run, the Service Failover operation must be active. So, it is important to rank it wisely. It does not use a large amount of resources so can easily be combined with other operations. Since it is a high priority operation, some customers extend the ranking onto user-facing servers. | SLM Collector | The SLM Collector is use to collect metrics from various source sot create performance or SIM-based service targets. The utilization of this feature varies by environment but often this is not a high-resource operation and can be combined with other operations |
Availability:As mentioned earlier, a fail over occurs when a Server or Companion Service (Service Failover) becomes unavailable. ‘Availability’ is determined by using a heartbeat3. For the regular operations (in the Server Group Operation Ranking form) the ARServer sends a heartbeat every 60 seconds. This interval is configurable from the Administration Console.
Every operation in the ranking form has a ‘Delinquency Threshold’. If an ARServer does not send a heartbeat within the number of intervals specified by the ‘Delinquency Threshold’, it is considered to be unavailable. These heartbeats can be observed as the ‘Intervalcount’ column in the servgrp_board table4 in the AR System database.
Service Failover operations each send their own heartbeat every minute. The ‘Service Failover’ operation checks to see if the heartbeats are received. These heartbeats can be observed in the AR System Service Failover Whiteboard form. If two heartbeats are missed, the Service Failover will redistribute the operations. It communicates directly to the running process to tell it what its new status is. Because of this feature, Service Failover operations run independently of an ARServer. In fact, Email Engine can run on a system that does not have a local ARServer. Example Ranking:1- The simplest way to rank a server group is to setup one back-end server and rank all operations on that server.
User Facing Servers | Operation | Rank | Administration | <none> | Approval Server | <none> | Archive | <none> | Assignment Engine | <none> | Atrium Integrator | <none> | Business Rules Engine | <none> | CMDB | <none> | DSO | <none> | E-Mail Engine | <none> | Escalation | <none> | Flashboards | <none> | Full Text Index | <none> | Normalization Engine | <none> | Reconciliation Engine | <none> | Service Failover | <none> | SLM Collector | <none> |
|
Back-End Server 1 | Operation | Rank | Administration | 1 | Approval Server | 1 | Archive | 1 | Assignment Engine | 1 | Atrium Integrator | 1 | Business Rules Engine | 1 | CMDB | 1 | DSO | 1 | E-Mail Engine | 1 | Escalation | 1 | Flashboards | 1 | Full Text Index | 1 | Normalization Engine | 1 | Reconciliation Engine | 1 | Service Failover | 1 | SLM Collector | 1 |
|
2- To provide some high-availability and to distribute some operations to increase performance, a 2nd back-end server can be setup.
User Facing Servers | Operation | Rank | Administration | <none> | Approval Server | <none> | Archive | <none> | Assignment Engine | <none> | Atrium Integration Engine | <none> | Business Rules Engine | <none> | CMDB | <none> | DSO | <none> | E-Mail Engine | <none> | Escalation | <none> | Flashboards | <none> | Full Text Index | <none> | Normalization Engine | <none> | Reconciliation Engine | <none> | Service Failover | <none> | SLM Collector | <none> |
|
Back-End Server 1 | Operation | Rank | Administration | 1 | Approval Server | 1 | Archive | 1 | Assignment Engine | 1 | Atrium Integrator | 2 | Business Rules Engine | 1 | CMDB | 1 | DSO | 1 | E-Mail Engine | 1 | Escalation | 1 | Flashboards | 1 | Full Text Index | 1 | Normalization Engine | 2 | Reconciliation Engine | 2 | Service Failover | 1 | SLM Collector | 1 |
|
Back-End Server 2 | Operation | Rank | Administration | 2 | Approval Server | 2 | Archive | 2 | Assignment Engine | 2 | Atrium Integrator | 1 | Business Rules Engine | 2 | CMDB | 2 | DSO | 2 | E-Mail Engine | 2 | Escalation | 2 | Flashboards | 2 | Full Text Index | 2 | Normalization Engine | 1 | Reconciliation Engine | 1 | Service Failover | 2 | SLM Collector | 2 |
|
3- If high-availability is required but a 2nd back-end server is not available, at the risk of end-user performance loss during a failover, operations can be ranked onto user-facing servers. Their rankings can be distributed across multiple User facing servers to distribute the load if the back-end server fails. Notice that in this example, FTS was not ranked on the user-facing server because if FTS is ranked there, it will always be an FTS indexer and we don't want that extra load on the user-facing environment.
User Facing Servers | Operation | Rank | Administration | 2 | Approval Server | 2 | Archive | 2 | Assignment Engine | 2 | Atrium Integrator | 2 | Business Rules Engine | 2 | CMDB | 2 | DSO | 2 | E-Mail Engine | 2 | Escalation | 2 | Flashboards | 2 | Full Text Index | <none> | Normalization Engine | 2 | Reconciliation Engine | 2 | Service Failover | 2 | SLM Collector | 2 |
|
Back-End Server 1 | Operation | Rank | Administration | 1 | Approval Server | 1 | Archive | 1 | Assignment Engine | 1 | Atrium Integrator | 1 | Business Rules Engine | 1 | CMDB | 1 | DSO | 1 | E-Mail Engine | 1 | Escalation | 1 | Flashboards | 1 | Full Text Index | 1 | Normalization Engine | 1 | Reconciliation Engine | 1 | Service Failover | 1 | SLM Collector | 1 |
|
4- If high-availability is required and a single back-end server does not provide enough resources and a 2nd back-end server is not available, at the risk of end-user performance, some operations can be distributed to the user-facing servers. Some of the lesser utilization operations, such as Archive and Flashboards, are good candidates. Notice that is this example, some operations were not ranked on a 3rd server.
User Facing Server 1 | Operation | Rank | Administration | <none> | Approval Server | 2 | Archive | 1 | Assignment Engine | 2 | Atrium Integrator | 2 | Business Rules Engine | 3 | CMDB | 3 | DSO | 1 | E-Mail Engine | 2 | Escalation | 2 | Flashboards | 1 | Full Text Index | 2 | Normalization Engine | 2 | Reconciliation Engine | 2 | Service Failover | 2 | SLM Collector | 2 |
|
User Facing Server 2 | Operation | Rank | Administration | 2 | Approval Server | 3 | Archive | <none> | Assignment Engine | 3 | Atrium Integrator | <none> | Business Rules Engine | 2 | CMDB | 2 | DSO | 2 | E-Mail Engine | 3 | Escalation | 3 | Flashboards | 3 | Full Text Index | <none> | Normalization Engine | 3 | Reconciliation Engine | 3 | Service Failover | 3 | SLM Collector | 3 |
|
Back-End Server 1 | Operation | Rank | Administration | 1 | Approval Server | 1 | Archive | 2 | Assignment Engine | 1 | Atrium Integrator | 1 | Business Rules Engine | 1 | CMDB | 1 | DSO | 3 | E-Mail Engine | 1 | Escalation | 1 | Flashboards | 2 | Full Text Index | 1 | Normalization Engine | 1 | Reconciliation Engine | 1 | Service Failover | 1 | SLM Collector | 1 |
|
Note: The above diagrams are simplistic examples. In every case, you can still rank Email Engine Mailboxes individually or make other changes to what is shown in the table.On-line Doc References: (1) https://docs.bmc.com/docs/ars91/setting-failover-rankings-for-servers-and-operations-609073388.html (2) https://docs.bmc.com/docs/ars91/service-failover-609072593.html (3) https://docs.bmc.com/docs/ars91/signaling-mechanism-in-a-server-group-609073423.html#Signalingmechanisminaservergroup-Heartbeatmechanism (4) https://docs.bmc.com/docs/ars91/configuring-the-server-group-check-interval-609073429.html (5) https://docs.bmc.com/docs/ars91/setting-failover-rankings-for-servers-and-operations-609073388.html (6) https://docs.bmc.com/docs/ars91/configuring-full-text-search-for-a-server-group-609073408.html (7) https://docs.bmc.com/docs/ars91/email-engine-service-failover-in-a-server-group-609072754.html (8) https://docs.bmc.com/docs/ars91/setting-failover-rankings-for-servers-and-operations-609073388.html see 'Refresh Ranking" |