The TSPS Server, when in HA, can sometimes shut itself down

Knowledge Article

Article Number

000222270

Old Article Number

000152766

Article Type

Product/Service Description

Title

The TSPS Server, when in HA, can sometimes shut itself down

Summary

The TSPS Server, when in HA, can sometimes shut itself down

Product

TrueSight Presentation Server

Component

TSPS Server

Applies to

TrueSight Presentation Server 11.0

Details

Various different errors can occur in differing logs, but all can lead to the TSPS shutting itself down. Here are some examples:

TrueSight.log

ERROR 11/07 15:11:32.192 [Timer-2] c.b.t.p.c.p.e.ESClientConnection BMC_TS-PL000134E             Exception while checking the Indexserver status
java.util.concurrent.ExecutionException: RemoteTransportException[[4lVQt-D][127.0.0.1:9300][cluster:monitor/health]]; nested: MasterNotDiscoveredException;
                at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:262)
                at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:249)
                at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:91)
                at com.bmc.truesight.platform.components.persistence.es.ESClientConnection.checkIfESRunning(ESClientConnection.java:135)
                at com.bmc.truesight.platform.components.persistence.es.ESClientConnection.isESReachableWithRetry(ESClientConnection.java:116)

indexserver.log

[2018-03-17T04:44:12,795][INFO ][o.e.d.z.ZenDiscovery     ] [4lVQt-D] master_left [{Niv1l2z}{Niv1l2znSxeebOqV6piIag}{PaNyRE54Ria_Bjxiff5Ksw}{eng-bmpre03.in.lab}{172.20.243.33:9300}], reason [transport disconnected]
[2018-03-17T04:44:12,796][WARN ][o.e.d.z.ZenDiscovery     ] [4lVQt-D] master left (reason = transport disconnected), current nodes: nodes:
   {4lVQt-D}{4lVQt-DZSaiqUDcptLGWnw}{ONkNQsukQfOTRH5rK6-qVg}{eng-bmpre04.in.lab}{172.20.247.34:9300}, local
   {Niv1l2z}{Niv1l2znSxeebOqV6piIag}{PaNyRE54Ria_Bjxiff5Ksw}{eng-bmpre03.in.lab}{172.20.243.33:9300}, master

NOTE: This error can also occur during a restart of TSPS nodes. When occurring during a restart only, this can be ignored.

indexserver.log

delaying allocation for [0] unassigned shards, next check in [0s]

Solution

1) Stop the secondary TSPS Server

2) Stop the primary TSPS Server

3) On both primary and secondary TSPS nodes, edit the file /etc/sysctl.conf and and the following lines at the end of the file:

net.ipv4.tcp_keepalive_time=600

net.ipv4.tcp_keepalive_intvl=60
net.ipv4.tcp_keepalive_probes=20

4) On both primary and secondary TSPS nodes, backup the file:

$TRUESIGHTPSERVER_HOME/truesightpserver/modules/elasticsearch/config/elasticsearch.yml

To a location outside $TRUESIGHTPSERVER_HOME

5) Edit the file elasticsearch.yml and append the following entries to the end of the file, or edit the values if the lines already exist:

# How often a node gets pinged. Defaults to 1s.
discovery.zen.fd.ping_interval: 30s

# How long to wait for a ping response, defaults to 3s.
discovery.zen.fd.ping_timeout: 50s

# How many ping failures / timeouts cause a node to be considered failed. Defaults to 3.
discovery.zen.fd.ping_retries: 5

# If the master does not receive acknowledgement from at least discovery.zen.minimum_master_nodes nodes within a certain time
# the cluster state change is rejected. defaults to 30 seconds
discovery.zen.commit_timeout: 180s

NOTE: Paste the content as it is from the Article, else you can get error in parsing.

6) Reboot the primary TSPS machine

7) Start the TSPS Server

8) Make sure UI comes up

9) Reboot the secondary TSPS machine

10) Start the TSPS Server

Attachment(s):

Manage Support IDs

BMC Contact Options

Knowledge Article