What recovery and debugging options are available for the TrueSight Capacity Optimization (TSCO) Gateway Server (formerly BMC Performance Assurance) Unix console Manager runs? NOTE: This document was originally published as Solution SLN000000222608.
TrueSight Capacity Optimization (Gateway Server) 20.02, 11.5, 11.3, 11.0, 10.7, 10.5, 10.3, 10.0 |
Q: If my Manager run wasn't executed, what is the best way to recover?The best way to recover is to manually execute the *.Manager script that wasn't executed. This will create the necessary scripts and start data collection on the remote node (from the time that you execute the *.Manager script).
Q: Is there an easy way to re-execute all of my *.Manager scripts?Here are the quick recovery commands when all Manager runs have failed to execute:As the user under which you schedule your Manager runs: > $BEST1_HOME/bgs/scripts/pcrontab.sh -list | grep Manager | grep -v GeneralManager | awk '{ print $7 }' > /tmp/runs.sh
> chmod 755 /tmp/runs.sh > /tmp/runs.sh This is useful for recovering when the Gateway Server console was down during the period where the runs would have been executed or if there are cron problems on the machine that caused all the runs to not be executed. Q: If my Manager run wasn't executed, how do I debug the problem?What we need to do is determine what caused the Manager run to not be executed. To determine that Technical Support would want to see the following files:
Q: How are these files useful for debugging?The entire contents of the /usr/adm/best1_V.V.VV/bgs/log/pcron directory
The entire contents of the /usr/adm/best1_V.V.VV/bgs/pcron/repository directory
The entire contents of the /usr/adm/best1_V.V.VV/local/manager directory
The output of 'ls -l [Manager Output Directory]' where [Manager Output Directory] is the directory where the *.Manager script exists for the Manager run that wasn't executed.
Depending on what Technical Support sees in those logs we might need to look further at things like the cron log (/var/cron/log and /var/cron/olog) but you need root access to get those so we generally don't request those initially. Above, /usr/adm/best1_V.V.VV is the $BEST1_HOME directory for your Gateway Server console version (for example, for BPA version 9.5.00 the correct path would be /usr/adm/best1_9.5.00. Q: Is there a way to pre-load the remote node with data collection requests so if Manager fails to issue data collection I'll be able to identify the problem before data has been lost?By default, Manager will pre-populate a few days worth of data collection requests onto the remote node. Manager will then register data collection requests with the agent and the agent will start those requests when the start time comes. Step 1 Edit the /usr/adm/best1_V.V.VV/local/setup/collectManager.cfg file.Step 2 Change the 'COLREQ_DAYS_ADVANCE = 3' parameter to 'COLREQ_DAYS_ADVANCE = #'. (Where '#' is the number of days in advance to pre-register data collection requests from 0 to 3). The default is '3' (maintain 3 days of registered collection requests). Step 3 Save the collectManager.cfg file. This option is especially useful in an environment where the managing node is going to be upgraded over the weekend and will be unavailable for a few days. This allows Manager to pre-register the collection requests so no data was lost while the Managing node was down. This Manager option provides a good buffer when debugging problems with Manager runs being consistently executed because it allows problems to be debugged without any data being lost on the remote nodes. The root cause of the Manager run instability would still need to be identified, but this feature eliminates the most negative symptoms of the problem while we worked to identify the source. Section II: Manager debugging methodologyWhen debugging Manager problems the first goal should be to determine which task is failing and then attempt to isolate that to a more specific step within that task.
The high level Manager tasks are:
A good place to start researching are either the General Manager Lite (GMLite) reports or the Gateway Manager UI that is part of the TSCO Application Server. Information similar to the UCM Status Reports mentioned below is also available in those newer interfaces. Outside of the Gateway Manager UI and GMLite reports, a good place to start researching is the UDR Collection Manager (UCM) status reports. This includes an overview of the data collection and data transfer component of the Manager run. For example, here is an example of the UDR status report showing data collection has failed for node hou-remprd-08: Expanding the node status we can see that data collection failed to start for that node because the connection to the Service Daemon is failing: The "Service daemon not installed on the remote node (connection refused)" message indicates that the collection request is not being received by the remote node. So, the appropriate place to begin debugging this issue is on the remote node itself to determine why the Service Daemon isn't responding. If the entire date is missing from the 'Manager Runs' tree on the left side of the UCM status reports then that would indicate that data collection requests were not registered for any Manager runs on that date. That generally indicates a problem with the execution of the *.Manager scripts due to a problem with cron, pcron, or the *.Manager scripts themselves. To debug Manager run execution issues the places to look are:
The $BEST1_HOME/bgs/log/pcron/[hostname]-[ManagerUser].log file will indicate if the scripts are being executed. If not, the 'pcrontab -list' command will tell us if the scripts are even scheduled to be executed. If the scripts are scheduled but aren't being executed the output of 'crontab -l' will tell us if the 'pcron' even is properly schedule in cron. If it is, the system cron log will indicate if that job is being properly run every minute on the machine. If we can see that the *.Manager scripts are properly scheduled in pcron they aren't being executed the most likely problems are:
Other debugging entry points within Manager on Unix include:
Related Products:
|