For the TrueSight Capacity Optimization (TSCO) Gateway Server/Agent there are two common types of Service Daemon connection problem:
To begin debugging the Service Daemon, from the console run the following command: > /usr/adm/best1_default/bgs/scripts/best1collect -n [TSCO Agent] -q
where [TSCO Agent] is the network accessible hostname or IP address of the remote node. If the Service Daemon port isn't listening on port 10128 the following messages should be generated: best1collect on topgun: requesting Update Status on [remote node] ...
Mon Jun 12 10:01:21 2006 Error: Unable to establish connection with service daemon Unable to communicate to server. Connection timed out. !Query: Request had no response from Service Daemon on node: [remote node] If the Service Daemon port is listening but the query is failing further in the communication process the following messages should be generated: best1collect on [managing node]: requesting Update Status on [remote node] ...
Mon Jun 12 10:03:51 2006 !Query: Request failed. Next, run the same command directly on the remote node itself. Are the error messages the same? If the command works on the remote node itself but not from the console the source of the problem is likely the network, or some sort of network security package. If the messages are the same the source of the problem is likely [x]inetd or the Service Daemon itself. Section I: The Service Daemon isn't listening on port 10128
Section II: The Service Daemon port is listening, but is failing somewhere internally and is exiting prematurely Section III: TCP Wrappers causing connection requests to be rejected Section IV: The Service Daemon is getting the request from [x]inetd quickly but is still failing This document was originally published as Solution SLN000000125142.
|
For the TrueSight Capacity Optimization Agent the log files are located in the /usr/adm/best1_default/bgs/log directory. By default, the Perform Service Daemon port is 10128. Although it is uncommon to change this port it may be different in your environment. Check the port number for the 'bgssd' entry in the /etc/services file to see if a non-default Service Daemon port is being used in your environment.
Section I: The Service Daemon isn't listening on port 10128The most common causes of the first problem are:
You can test the first cause by logging onto the remote node and running the following command: > netstat -an | grep 10128
The output should look something like this: *.10128 *.* 0 0 0 0 LISTEN
If you don't get a LISTEN line back that means that nothing is listening on port 10128. There are two ways to configure the Service Daemon, it can be configured to listen directly on port 10128 or configured to have xinetd listen on port 10128 and pass the connections it receives onto the Service Daemon. The /[TSCO Agent Installation Directory]/.b1configVVVV.sav file's BMC_PERFORM_UNIX_SERVICESETUP configuration property indicates how the Service Daemon has been configured. A value of 'y' indicates the Service Daemon is running in standaline mode so it will be running all the time and listening directly on port 10128. A value of 'n' indicates the Service Daemon is running through xinetd. If the Service Daemon is configured in standalone mode check that it is running on the machine: ps -ef | grep bgssd If the Service Daemon isn't running it will need to be restarted: # service bgssd start If there is a LISTEN on port 10128 a good test is to query the Service Daemon: $BEST1_HOME/bgs/scripts/best1collect -n localhost -Q If that works try: $BEST1_HOME/bgs/scripts/best1collect -n [hostname] -Q The test using 'localhost' is a good starting test for local firewall rules blocking the connection to port 10128. Often if the local firewall is blocking the connection it will work when using 'localhost' but not work when using the 'hostname' of the machine in the query. The commands to check whether a local firewall is active differ across the various Operating Systems and across OS versions. Engage your system administration team to determine if a local firewall is enabled on the machine and is blocking port 10128. Another test to run checking for the second and third cause by running the following command from the managing node: > telnet [remote node] 10128
This should produce output like this: Trying ###.###.###.###...
Connected to topgun.boston.bmc.com. Escape character is '^]'. SDPACKConnection closed by foreign host. Also, on the remote node the /usr/adm/best1_defaut/bgs/log/[hostname]-bgsSD.log file should be updated when you do this. This message would indicate a problem connecting to the remote node: Trying ###.###.###.###...
telnet: Unable to connect to remote host: Connection refused You can test the first cause by logging onto the remote node and running the following command: > netstat -an | grep 10128
The output should look something like this: *.10128 *.* 0 0 0 0 LISTEN
If you don't get a LISTEN line back that means that nothing is listening on port 10128 and you'd need to fix inetd to start listening on that port. For AIX, HP-UX, Solaris 9 and earlier, if you don't get the LISTEN line back, then check the contents of the /etc/inetd.conf file and make sure that the 'bgssd' line is in that file. The line should look like this: bgssd stream tcp nowait perform /etc/bgs/SD/bgssd bgssd.exe -d /etc/bgs/SD
If that line doesn't exist in the inetd.conf file, try re-running the /[Installation Directory]/b1configVVVV.sh script as root (where VVVV is the product version number - for example, 7300). The b1configVVVV.sh script is the script that configures the machine to Perform. Its located in the installation directory at the same level as the OS architecture directory created by the install. For example, if BMC Performance Assurance for Servers version 7.3.00 is installed in /opt/bmc then the b1configVVVV.sh script will be in the /opt/bmc/Patrol3 directory. Solaris 10 uses SMF configuration for inetd so the /etc/inetd.conf file will not be updated. Instead, if inetd is not listening check if the /var/svc/manifest/network/bgssd-tcp.xml exists. More information on SMF can be found here: http://sysunconfig.net/unixtips/solaris.html#smf.
Another possibility is that the 'bgssd' service entry is not defined in 'yp' services (if yp is enabled), called now today NIS ( Network Information Service ). On the machines, run the command 'ypwhich'. If it comes back with a hostname of the NIS master, (rather than an error about no yp services bound or something like that) then run the command 'ypcat services | grep bgssd' and you get an empty result. The problem could be that you have yp services enabled on this machine and the bgssd entry isn't listed in the yp master files. By default under HP-UX it will use the NIS master tables to obtain the list of services and to fix the issue you must configure the service on the NIS master in /etc/services and push the change the NIS database, the NIS documentation from the Operating System documentation gives more details. But, there is a way to override this. You can create a file in /etc called 'nsswitch.conf' that has the line: services: files nis
This tells the machine to look in the local services file first and then go to the NIS master. Note that inetd will only look at one thing for services. Therefore, if the entry isn't in NIS then it won't fall back to the local files. Alternately with the line below if the entry isn't in the local files then inetd won't look for it in NIS. Alternately, the machine might be running NIS+. To see if NIS+ is being used, run the following command: > niscat services.org_dir
If that comes back with an error message, then NIS+ is probably not enabled. If that comes back with data then NIS+ is enabled and the above steps for 'yp' are applicable. If the 'nsswitch.conf' file is changed then it is necessary to completely stop and restart inetd. The nsswitch.conf file is not re-read when inetd is issued a Hangup signal (SIGHUP). Typically stopping and restarting inetd is best done with a machine reboot. It is possible to stop and restart the inetd service while the machine is running, but on some machines (especially older OS versions) inetd may fail to restart properly. A common indication that the problem is the Service Name isn't defined in yp would be the following error message in the system log file (/var/adm/messages on Solaris): inetd[9329]: [ID 965992 daemon.error] bgssd/tcp: unknown service
NOTE: HP-UX doesn't support the normal set of netstat flags available on Linux. You can check on system logs whether inetd passing the connection onto the bgssd.exe process. So, there should be an error message being generated by inetd that says why it isn't starting the bgssd.exe process. It could be something like the TSCO Agent Installation Owner doesn't have a valid $HOME directory. I most cases on AIX and/or HP-UX machines.
we expect this is an inetd problem and there is going to be a clear message in one of the system logs that tells you why inetd isn't passing the connection onto the Service Daemon.
Section II: The Service Daemon port is listening, but is failing somewhere internally and is exiting prematurely.The first thing to determine when the Service Daemon port is listening via [x]inetd but the Service Daemon still isn't working is whether the problem is that [x]inetd isn't passing the connection onto the Service Daemon (or isn't passing the request on quickly enough), or whether the problem is that the Service Daemon is failing to handle the request properly.
Step 1: Service Daemon Telnet Test date;echo "" | telnet localhost 10128;date
So, that the output from that command should look like this: $ date;echo "" | telnet localhost 10128;date
Mon Nov 21 17:33:05 EST 2005 Trying... Connected to localhost.bmc.com. Escape character is '^]'. SDPACKConnection closed by foreign host. Mon Nov 21 17:33:06 EST 2005 The /usr/adm/best1_default/bgs/log/[hostname]-bgsSD.log will be updated with the following messages from that test: Mon Jun 12 10:48:04 2006 Service Daemon (1292) Service Daemon -- Version: 7.3.00 for SunOS 5.9
Mon Jun 12 10:48:04 2006 Service Daemon (1292) Solaris 9 Mon Jun 12 10:48:04 2006 Service Daemon (1292) Mon Jun 12 10:48:04 2006 Service Daemon (1292) sparc, 64 Bit Mon Jun 12 10:48:04 2006 Service Daemon (1292) Built on Oct 12 2005 01:19:28 Code: 7.3.00.0170 Mon Jun 12 10:48:04 2006 Service Daemon (1292) LANG = Mon Jun 12 10:48:04 2006 Service Daemon (1292) GC_LANG = Mon Jun 12 10:48:04 2006 Service Daemon (1292) Service Daemon Starting Mon Jun 12 10:48:04 2006 Service Daemon (1292) -d /etc/bgs/SD Mon Jun 12 10:48:04 2006 Service Daemon (1292) Error : Invalid Magic Number Encountered in Message Header Mon Jun 12 10:48:04 2006 Service Daemon (1292) Error : Invalid Message Format Encountered Mon Jun 12 10:48:04 2006 Service Daemon (1292) Finished. Since that test is just a connect to port 10128, giving the Service Daemon an empty message causes it to hang up and exit immediately which means there are only a few things that happen between the two dates:
If the 'hostname' lookup were picking the wrong IP address or were taking a long time then it would take a long time for the 'Trying...' message to appear. If inetd wasn't listening then the 'Connected to localhost' message would be replaced with a 'Connection Refused' message. This would also occur if a firewall preventing the connection (although that is unlikely when connecting to 'localhost'). So, by the time we get the "Escape character is '^]'." message that means we're either looking at a problem with inetd passing control over to the bgssd.exe process or a problem with the bgssd.exe binary itself. So, if the 'Connected to ...' message comes up quickly, but the SDPACK doesn't come up quickly then we know we're debugging problem #3 or #4 in the above list. Step 2: Date command output script test To determine whether we are debugging problem #3 or #4 the next test is to remove the Service Daemon from the mix and replace it with a script that just prints out the date. To get the script called by inetd, we'd want to rename the /etc/bgs/SD/bgssd.exe binary to /etc/bgs/SD/bgssd.exe.orig and then name the script /etc/bgs/SD/bgssd.exe [Make sure the script is executable (chmod 755 bgssd.exe)]. Now, when a network connection is made on port 10128 inetd should pass control to our script and print the date. Step 1 Backup your existing /etc/bgs/SD/bgssd.exe binary. mv /etc/bgs/SD/bgssd.exe /etc/bgs/SD/bgssd.exe.orig
Step 2Create an /etc/bgs/SD/bgssd.exe script that contains the following contents: #!/bin/sh
/bin/date /bin/date >> /tmp/bgssd_date.out Step 3 Ensure the /etc/bgs/SD/bgssd.exe script is executable: chmod 755 /etc/bgs/SD/bgssd.exe
Step 4 Test the script to make sure that it outputs the date when run from the command line: > /etc/bgs/SD/bgssd.exe
Tue Aug 31 11:30:47 CDT 2010 Run the following command: date;echo "" | telnet localhost 10128;date
On a working machine with the script in place the output will look something like this: > date ; echo "" | telnet localhost 10128 ; date
Tue Aug 31 12:32:25 EDT 2010 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Tue Aug 31 12:32:25 EDT 2010 Connection to localhost closed by foreign host. Tue Aug 31 12:32:25 EDT 2010 If that command didn't output the three dates, run the following command: > date;telnet localhost 10128;date
On a working machine with the script in place the output will look something like this: > date;telnet localhost 10128;date
Tue Aug 31 12:32:38 EDT 2010 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Tue Aug 31 12:32:39 EDT 2010 Connection to localhost closed by foreign host. Tue Aug 31 12:32:39 EDT 2010 This second command is useful to run because on some machines the 'echo "" | telnet localhost 10128' causes the date output from the test script to be suppressed. Since we are using that output as an indication of a problem it is good to test with both commands. On a working machine, we would have 3 dates that all print the same date (although anything within 3 seconds would be fine). If the second date is long after the first date (or is never printed) then we know that we're debugging issue #3 where inetd isn't passing the request onto the bgssd.exe binary in a timely manner. That means that we're looking at a problem outside of any of the Perform binaries - something wrong with the Operating System itself. If you still haven't seen a date printed, check the /tmp/bgssd_date.out file that should have been created by the test script. Is there a date in that file that matches the time you ran one of the telnet tests? Note that there should be at least one date in the file from the test execution of the script in step 4. If there is no date in the /tmp/bgssd_date.out file that would be solid evidence that the /etc/bgs/SD/bgssd.exe script was not being executed by inetd on the machine. The next place to check is the OS System log on the machine for errors related to inetd. Solaris grep inetd /var/adm/messages
Linux grep inetd /var/log/messages
AIX Look for /etc/syslog.conf, generally contains information related to a log file in which inetd is writing messages
to. If the system log is reporting a log of messages from [x]inetd for other days then this command might work better: grep inetd [messages files] | grep "Mar 16"
where 'Mar 16' is the current date. That will restrict the output to messages in the syslog from today generated by the inetd process. NOTE: Section III: TCP Wrappers causing connection requests to be rejectedOne common issue causing Service Daemon connection problems is the use of TCP Wrappers to restrict access to services controlled by [x]inetd. By default, TCP Wrappers are enabled for [x]inetd on Linux and on recent versions of Solaris. TCP Wrappers may also be enabled on other Unix variants such as AIX and HP-UX.
The problem symptoms when TCP Wrappers are enabled are:
System messages logs may look similar to the following messages: Solaris /var/adm/messages: Jul 17 14:12:27 [hostname] bgssd.exe[2283]: [ID 808958 daemon.warning] refused connect from [remote host] (access denied) TCP Wrappers access is controlled by the following files:
If the /etc/hosts.deny file has an entry that prevents services from the managing node or localhost to be accessed, for example, if the file contains just an "ALL:ALL" entry then to enable collections, query requests and transfer requests to be successfully initiated from the managing node, the /etc/hosts.allow file must have an entry allowing the bgssd service to be accessed both by the managing node and by localhost. For example: bgssd.exe : [managing node] localhost
bgssd : [managing node] localhost where [managing node] is the network name or IP address of the managing node. If there are questions regarding making changes to these files, contact the system administrator. Alternately, the following entries would allow any host to make requests to the Service Daemon: bgssd.exe : ALL
bgssd : ALL
To validate if TCP Wrappers is enabled on a Solaris server ask for the output of the following commands: > svcprop -p defaults inetd | grep wrappers defaults/tcp_wrappers boolean true or > inetadm -l telnet | grep tcp_wrappers default tcp_wrappers=TRUE
inetadm | grep -i tcp Output should be similar to the following: enabled online svc:/network/rpc/cde-ttdbserver:tcp Then run "inetadm -l" for each of the services, for example: inetadm -l /network/bgssd/tcp SCOPE NAME=VALUE If tcp_wrappers=TRUE is set, then tcp_wrapper is enable on the server and will need to be disabled or modify the /etc/hosts.allow file for the BPA product to work properly.
Section IV: The Service Daemon is getting the request from [x]inetd quickly but is still failing.There are many causes of the !Query: Request failed. message but if the [hostname]-bgsSD.log is being updated the messages in that log should give us a good idea of how to debug this issue further. So, the next step is to provide Technical Support with a copy of the /usr/adm/best1_default/bgs/log/[hostname]-bgsSD.log [and the .log.bak] files from a time when the problem occurs.
A normal service daemon log file will begin with the following messages: Mon Jun 12 10:19:08 2006 Service Daemon (26436) Service Daemon -- Version: 7.3.00 for SunOS 5.9
Mon Jun 12 10:19:08 2006 Service Daemon (26436) Solaris 9 Mon Jun 12 10:19:08 2006 Service Daemon (26436) Mon Jun 12 10:19:08 2006 Service Daemon (26436) sparc, 64 Bit Mon Jun 12 10:19:08 2006 Service Daemon (26436) Built on Oct 12 2005 01:19:28 Code: 7.3.00.0170 Mon Jun 12 10:19:08 2006 Service Daemon (26436) LANG = Mon Jun 12 10:19:08 2006 Service Daemon (26436) GC_LANG = Mon Jun 12 10:19:08 2006 Service Daemon (26436) Service Daemon Starting Mon Jun 12 10:19:08 2006 Service Daemon (26436) -d /etc/bgs/SD Mon Jun 12 10:19:08 2006 Service Daemon (26436) Info : IP Addr topgun Mon Jun 12 10:19:08 2006 Service Daemon (26436) Info : User Name paska Mon Jun 12 10:19:08 2006 Service Daemon (26436) Info: Message Type (M,m) =(6,2) Mon Jun 12 10:19:08 2006 Service Daemon (26436) Info: Sending Agent Version :7.3.00 Mon Jun 12 10:19:08 2006 Service Daemon (26436) Info: Changing directory to /usr/adm/best1_7.3.00 Mon Jun 12 10:19:08 2006 Service Daemon (26436) Info: Console Sent Version :7.3.00 Mon Jun 12 10:19:08 2006 Service Daemon (26436) Starting agent with parameters: Mon Jun 12 10:19:08 2006 Service Daemon (26436) Version: 7.3.00 Mon Jun 12 10:19:08 2006 Service Daemon (26436) BEST1_HOME: NULL Mon Jun 12 10:19:08 2006 Service Daemon (26436) Agent Port: 57300 Mon Jun 12 10:19:08 2006 Service Daemon (26436) Investigate Port: 57301 Mon Jun 12 10:19:08 2006 Service Daemon (26436) User Name: paska Mon Jun 12 10:19:08 2006 Service Daemon (26436) Host Address: 172.21.131.4 Mon Jun 12 10:19:08 2006 Service Daemon (26436) Info : Check if bgsagent is already running by connecting to it Mon Jun 12 10:19:08 2006 Service Daemon (26436) Info : bgsagent is already running Mon Jun 12 10:19:08 2006 Service Daemon (26436) bgsagent.exe already running with port 57300 Mon Jun 12 10:19:08 2006 Service Daemon (26436) Info: Talking to 7.3.00 Agent Mon Jun 12 10:19:08 2006 Service Daemon (26436) /etc/bgs/SD/.B1ReleaseMap is not a file. Using table. Mon Jun 12 10:19:08 2006 Service Daemon (26436) /usr/adm/best1_7.3.00 Mon Jun 12 10:19:08 2006 Service Daemon (26436) Trying to use VERSION = 7.3.00 BEST1_HOME = /usr/adm/best1_7.3.00 Mon Jun 12 10:19:08 2006 Service Daemon (26436) Info : waiting for read file lock Mon Jun 12 10:19:08 2006 Service Daemon (26436) Info : file lock obtained If the Service Daemon needs to start the bgsagent process (one isn't already running) then next you'll typically see messages like this: Mon Jun 12 10:20:06 2006 Service Daemon (26579) Info : Check if bgsagent is already running by connecting to it
Mon Jun 12 10:20:06 2006 Service Daemon (26579) Couldn't establish connection to 127.0.0.1 on port 57300, Error = Connection refused Mon Jun 12 10:20:06 2006 Service Daemon (26579) Connection refused Mon Jun 12 10:20:06 2006 Service Daemon (26579) Info : bgsagent is not running : will restart Mon Jun 12 10:20:06 2006 Service Daemon (26579) starting bgsagent.exe with port 57300 Mon Jun 12 10:20:11 2006 Service Daemon (26579) Info: Talking to 7.3.00 Agent Next, the Service Daemon will report the results of whatever it was asked to do (start data collection, transfer data, query current data collection requests, and so on). Below is the output from a 'query' request (the remote node was asked to list the currently running data collection requests: Mon Jun 12 10:19:08 2006 Service Daemon (26436) Collect Query file read successful: /usr/adm/best1_7.3.00/bgs/monitor/log/topgun-bgsagent_57300.als
Mon Jun 12 10:19:08 2006 Service Daemon (26436) UDR collect query initiated Mon Jun 12 10:19:08 2006 Service Daemon (26436) Agent Query Request Starting(/usr/adm/best1_7.3.00/bgs/monitor/log/topgun-bgsagent_57300.als) Collect Instance Node Started Started Name Name Name By On ------------ -------------- ---------- ---------- ----------------- ------------ -------------- ---------- ---------- ----------------- Mon Jun 12 10:19:08 2006 Service Daemon (26436) Finished. Section V: Other [x]inetd related connection problemsLinux specific xinetd problems
If the Perform Installation Owner specified in the /etc/inetd.conf file for the 'bgssd' service entry is invalid [x]inetd will fail to execute the Service Daemon process. This is because the Service Daemon is executed as the Perform Installation Owner by [x]inetd. This problem will typically generate an error message in the system messages file. Jul 24 15:55:51 [hostname] inetd[343]: [ID 317013 daemon.notice] bgssd[20905] from [remote host] 58283
Jul 24 15:55:51 [hostname] inetd[20905]: [ID 455825 daemon.error] getpwnam: [hostname]: No such user Solaris ipmon local firewall software If the Solaris ipmon local firewall software is installed and configured, /var/adm/messages will be populated with messages similar to: Jul 13 11:29:32 vscwsprap1 ipmon[111]: [ID 702911 local0.warning] 11:29:32.379929 e1000g1 @0:1 b 10.33.35.225,36000 -> 172.28.92.21,10128 PR tcp len 20 64 -S IN
Jul 13 11:29:32 vscwsprap1 ipmon[111]: [ID 702911 local0.warning] 11:29:32.578371 e1000g1 @0:1 b 10.33.35.225,35953 -> 172.28.92.21,10128 PR tcp len 20 40 -R IN The configuration file in /etc/ipf named ipf.conf will need to be tuned to allow connections for port 10128 and 6767. pass in quick on hme0 proto tcp/udp from "console IP address" to any port = 10128
pass in quick on hme0 proto tcp/udp from "console IP address" to any port = 6767 After making the appropriate updates to the configuration files, you will need to have ipmon reload the configuration. # Load rules in /etc/ipf/ipf.conf file into the active firewall > ipf -f /etc/ipf/ipf.conf Related Products:
|