One of the worst thing that can happen on the BMC TrueSight Capacity Optimization Gateway Server (formerly BMC Performance Assurance, BPA, or Perform) console is for the $BEST1_HOME file system to fill up. If the udrCollectMgr processes can't update the $BEST1_HOME/local/manager/run files that can cause problems that are hard to recovery from because the Manager runs for the day become totally corrupted meaning that it would be necessary to move from automated recovery steps to very manual recovery steps.
The first thing to determine is what part failed -- data transfer, data processing or both? The easiest way to do identify problems with data transfer is by looking at the UCM Status Reports in the $BEST1_HOME/local/manager/status directory via a web browser (either a local web browser on your console or you can copy the files to a Windows machine and open the UCMStatus.html file in a web browser on the Windows machine).
The goal is to see that all of your Manager runs are listed in the UCM Status Reports and then look through whether there are problems with failed transfer requests.
It would also be possible to figure that out using the udrCollectStat command (since you can get to all the information available in the UCM Status Reports via the udrCollectStat command. For example:
$BEST1_HOME/bgs/bin/udrCollectStat -D -d `date --date=yesterday +%m-%d-%Y` -f "%v %r %d %n: %s, %ch, %ce, %ces %th %te %tes %tg %tt"
Information about why the udrCollectMgr (data collection/data transfer) part of the Manger run failed can be obtained by looking at the UCM logs in the $BEST1_HOME/local/manager/log directory. Unfortunately there are somewhat difficult to interpret unless you have previous experience with them.
If the problem on the console was just that the UDR data repository filled up that is less of a major problem. In that case the Manager runs would all be listed in the UCM Status Reports but there would be a number of failed transfer attempts associated with the runs.
If the problem is just failed transfer then this document has the best way to recover your data transfers:
000031803: How can Perform UDR capacity planning data be manually transferred from the remote agent node to the Perform console server?
Once you've recovered the failed transfers via the '*.XferData -r' command you'd then need to execute the *.ProcessDay scripts for the Manager runs that failed to process.
The following command can be run to list the active Manager runs and the associated Manager Output Directory:
$BEST1_HOME/bgs/scripts/listManagerRuns.pl -p MANAGER_COMMANDS_FILE OUTPUT_DIRECTORY
Note that if the problem has been caught quickly (the day that the file system filled or the day after) then the following KA describes a command that be used to recover the failed processing and re-initialize the run:
See KA 000210037: For the TrueSight Capacity Optimization (TSCO) Gateway Server on Linux, what is the best way to re-initialize Manager runs if the *.Manager script hasn't been getting executed? (https://bmcsites.force.com/casemgmt/sc_KnowledgeArticle?sfdcid=000210037)
Common Gateway Manager symptoms of a file system full condition on the Linux console:
The most common problem symptoms associated with a file system full condition on the Linux Gateway Server console are:
- Manager runs are not listed in Gateway Manager -> Gateway Reports: Exception Reports
- Last night's data collection and transfer status for individual computers are not listed under Gateway Manager -> Gateway Reports: Node History
When some combination of the Console Data Repository and Manager Output Directory are on the same file system as the $BEST1_HOME directory it is common for the file system to not be 100% full during the day (when data processing isn't running). This is because more disk space is required during nightly data processing than after the Manager runs have finished. This is because both the UDR data cleanup will have been run in the Console UDR Data Repository and the temporary file deletion has been run in the Manager Output Directory.
Advanced Recovery
When using AutoNodeDiscovery (Agent List based Manager runs) if the output directory where the Manager input files are located fills there is the potential for key configuration file corruption. This includes:
- The Manager *.vcmds files may be truncated
- The *.dmn files may be truncated
This happens due to the AutoNodeDiscovery job updating both the VCMDS file and DOMAIN files each night to ensure that the computer list in the Manager run is synchronized with the Agent List.
Look for Manager *.vcmds file that are unusually small or domain files that are 0 length.
There is no automated mechanism to rebuild a Manager *.vcmds file if it has been truncated. If a backup copy of the parent Manager run *.vcmds file is available that *.vcmds file can be restored before re-running the [date]-[date].Manager script. Once the *.Manager script is run for the master VCMDS it will repair the child VCMDS files (the additional runs created by AutoNodeDiscovery to handle the agent list) and will repair all of the underlying *.dmn files. Once the files have been repaired the [date].[date].Manager runs can be re-executed to re-initialize the run.
Related Products:
- TrueSight Capacity Optimization
- BMC Capacity Management
- BMC Performance Assurance
Legacy ID:KA345366